Memory knob testing (was Re: Let's make PostgreSQL multi-threaded)

2023-10-11 Thread Merlin Moncure
On Fri, Aug 25, 2023 at 8:35 AM Stephen Frost  wrote:

> Greetings,
>
> This is getting a bit far afield in terms of this specific thread, but
> there's an ongoing effort to give PG administrators knobs to be able to
> control how much actual memory is used rather than depending on the
> kernel to actually tell us when we're "out" of memory.  There'll be new
> patches for the September commitfest posted soon.  If you're interested
> in this issue, it'd be great to get more folks involved in review and
> testing.
>

Noticed I missed this.  I'm interested.   Test #1 would be to set memory to
about max there is, maybe a hair under, turn off swap, and see what happens
in various dynamic load situations.

Disabling overcommit is not a practical solution in my experience; it moves
instability from one place to another and seems to make problems appear in
a broader set of situations. For zero downtime platforms it has place but I
would tend to roll the dice on a reboot even for direct user facing
applications given that it can provide relief for systemic conditions.

My unsophisticated hunch is that postgres and the kernel are not on the
same page about memory somehow and that the multi-process architecture
might be contributing to that issue.  Of course, regarding
rearchitecture skeptically and realistically is a good idea given the
effort and risks.

I guess, in summary, I would personally rate things like better management
of resource tradeoffs, better handling of transient dmenands, predictable
failure modes, and stability in dynamic workloads over things like better
performance in extremely high concurrency situations.  Others might think
differently for objectively good reasons.

merlin


Re: Let's make PostgreSQL multi-threaded

2023-08-25 Thread Stephen Frost
Greetings,

* David Geier (geidav...@gmail.com) wrote:
> On 8/11/23 14:05, Merlin Moncure wrote:
> > Hm, noted this upthread, but asking again, does this
> > help/benefit interactions with the operating system make oom kill
> > situations less likely?   These things are the bane of my existence, and
> > I'm having a hard time finding a solution that prevents them other than
> > running pgbouncer and lowering max_connections, which adds complexity. 
> > I suspect I'm not the only one dealing with this.   What's really scary
> > about these situations is they come without warning.  Here's a pretty
> > typical example per sar -r.
> > 
> > The conjecture here is that lots of idle connections make the server
> > appear to have less memory available than it looks, and sudden transient
> > demands can cause it to destabilize.
> 
> It does in the sense that your server will have more memory available in
> case you have many long living connections around. Every connection has less
> kernel memory overhead if you will. Of course even then a runaway query will
> be able to invoke the OOM killer. The unfortunate thing with the OOM killer
> is that, in my experience, it often kills the checkpointer. That's because
> the checkpointer will touch all of shared buffers over time which makes it
> likely to get selected by the OOM killer. Have you tried disabling memory
> overcommit?

This is getting a bit far afield in terms of this specific thread, but
there's an ongoing effort to give PG administrators knobs to be able to
control how much actual memory is used rather than depending on the
kernel to actually tell us when we're "out" of memory.  There'll be new
patches for the September commitfest posted soon.  If you're interested
in this issue, it'd be great to get more folks involved in review and
testing.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Let's make PostgreSQL multi-threaded

2023-08-25 Thread David Geier

Hi,

On 8/11/23 14:05, Merlin Moncure wrote:

On Thu, Jul 27, 2023 at 8:28 AM David Geier  wrote:

Hi,

On 6/7/23 23:37, Andres Freund wrote:
> I think we're starting to hit quite a few limits related to the
process model,
> particularly on bigger machines. The overhead of cross-process
context
> switches is inherently higher than switching between threads in
the same
> process - and my suspicion is that that overhead will continue to
> increase. Once you have a significant number of connections we
end up spending
> a *lot* of time in TLB misses, and that's inherent to the
process model,
> because you can't share the TLB across processes.

Another problem I haven't seen mentioned yet is the excessive kernel
memory usage because every process has its own set of page table
entries
(PTEs). Without huge pages the amount of wasted memory can be huge if
shared buffers are big.


Hm, noted this upthread, but asking again, does this 
help/benefit interactions with the operating system make oom kill 
situations less likely?   These things are the bane of my existence, 
and I'm having a hard time finding a solution that prevents them other 
than running pgbouncer and lowering max_connections, which adds 
complexity.  I suspect I'm not the only one dealing with this.  
 What's really scary about these situations is they come without 
warning.  Here's a pretty typical example per sar -r.


The conjecture here is that lots of idle connections make the server 
appear to have less memory available than it looks, and sudden 
transient demands can cause it to destabilize.


It does in the sense that your server will have more memory available in 
case you have many long living connections around. Every connection has 
less kernel memory overhead if you will. Of course even then a runaway 
query will be able to invoke the OOM killer. The unfortunate thing with 
the OOM killer is that, in my experience, it often kills the 
checkpointer. That's because the checkpointer will touch all of shared 
buffers over time which makes it likely to get selected by the OOM 
killer. Have you tried disabling memory overcommit?


--
David Geier
(ServiceNow)





Re: Let's make PostgreSQL multi-threaded

2023-08-23 Thread Mark Woodward
On Mon, Jun 12, 2023 at 5:17 PM Heikki Linnakangas  wrote:

> On 10/06/2023 21:01, Hannu Krosing wrote:
> > On Mon, Jun 5, 2023 at 4:52 PM Heikki Linnakangas 
> wrote:
>
> <<>>

>
> > * The backend code would be more complex.
> > -- this is still the case
>
> I don't quite buy that. A multi-threaded model isn't inherently more
> complex than a multi-process model. Just different. Sure, the transition
> period will be more complex, when we need to support both models. But in
> the long run, if we can remove the multi-process mode, we can make a lot
> of things *simpler*.
>

If I may weigh in here:
Making a previously unthreaded process able to handle multiple threads, is
a tedious process.

>
> > -- even more worrisome is that all extensions also need to be rewritten
>
> "rewritten" is an exaggeration. Yes, extensions will need adapt, similar
> to the core code. But I hope it will be pretty mechanical work, marking
> global variables as thread-local and such. Many extensions will work
> with little to no changes.
>

I can tell you from experience it isn't that easy.  In my career I have
taken a few "old" technologies and made them multithreaded and it is really
a complex and laborious undertaking.
Many operations that you do just fine without threads will break in a
multithreaded system. You need to make sure every function in every library
that you use is "thread safe."  Take a file handle, if you read, seek, or
write a file handle you are fine in a single process, but this breaks in a
multithreaded environment if the file handle is shared. That's a very
simple example. Openssl operations will almost certainly break and you will
need to rewrite your ssl stuff and protect some things with mutexes. When
you fork() a lot is essentially duplicated (COW) between the parent and
child that will ultimately be shared in a threaded model. Decades old
assumptions in the design and architecture will break and you will need to
rethink what you are doing and how it is done. You will need to change file
handling to get beyond the 1024 file limit in calls like "select." There is
a LOT of this kind of stuff, it is not mechanical. I even call into
question "Many extensions will work with little to no changes" as those too
will need to be audited for thread safety.  Think about loading extensions,
extensions are typically not loaded until they are used. In a
multi-threaded model, a shared library will only be loaded once. Think
about memory management, you will have multiple threads fighting over the
global heap as they allocate memory.  The list is virtually endless.


>
> > -- and many incompatibilities will be silent and take potentially years
> to find
>
> IMO this is the most scary part of all this. I'm optimistic that we can
> have enough compiler support and tooling to catch most issues. But we
> don't know for sure at this point.
>

We absolutely do not know and it *is* very scary.


>
> > * Terminating backend processes allows the OS to cleanly and quickly
> > free all resources, protecting against memory and file descriptor
> > leaks and making backend shutdown cheaper and faster
> > -- still true
>
> Yep. I'm not too worried about PostgreSQL code, our memory contexts and
> resource owners are very good at stopping leaks. But 3rd party libraries
> could pose hard problems. IIRC we still have a leak with the LLVM JIT
> code, for example. We should fix that anyway, of course, but the
> multi-process model is more forgiving with leaks like that.
>
> Again, we believe that this is true.


> --
> Heikki Linnakangas
> Neon (https://neon.tech)
>
>
>
>


Re: Let's make PostgreSQL multi-threaded

2023-08-11 Thread Merlin Moncure
On Thu, Jul 27, 2023 at 8:28 AM David Geier  wrote:

> Hi,
>
> On 6/7/23 23:37, Andres Freund wrote:
> > I think we're starting to hit quite a few limits related to the process
> model,
> > particularly on bigger machines. The overhead of cross-process context
> > switches is inherently higher than switching between threads in the same
> > process - and my suspicion is that that overhead will continue to
> > increase. Once you have a significant number of connections we end up
> spending
> > a *lot* of time in TLB misses, and that's inherent to the process model,
> > because you can't share the TLB across processes.
>
> Another problem I haven't seen mentioned yet is the excessive kernel
> memory usage because every process has its own set of page table entries
> (PTEs). Without huge pages the amount of wasted memory can be huge if
> shared buffers are big.


Hm, noted this upthread, but asking again, does this
help/benefit interactions with the operating system make oom kill
situations less likely?   These things are the bane of my existence, and
I'm having a hard time finding a solution that prevents them other than
running pgbouncer and lowering max_connections, which adds complexity.  I
suspect I'm not the only one dealing with this.   What's really scary about
these situations is they come without warning.  Here's a pretty typical
example per sar -r.

 kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit
%commit  kbactive   kbinact   kbdirty
 14:20:02   461612  15803476 97.16 0  11120280  12346980
  60.35  10017820   4806356   220
 14:30:01   378244  15886844 97.67 0  11239012  12296276
  60.10  10003540   4909180   240
 14:40:01   308632  15956456 98.10 0  11329516  12295892
  60.10  10015044   4981784   200
 14:50:01   458956  15806132 97.18 0  11383484  12101652
  59.15   9853612   5019916   112
 15:00:01 10592736   5672352 34.87 0   4446852   8378324
  40.95   1602532   3473020   264   <-- reboot!
 15:10:01  9151160   7113928 43.74 0   5298184   8968316
  43.83   2714936   3725092   124
 15:20:01  8629464   7635624 46.94 0   6016936   8777028
  42.90   2881044   4102888   148
 15:30:01  8467884   7797204 47.94 0   6285856   8653908
  42.30   2830572   4323292   436
 15:40:02  8077480   8187608 50.34 0   6828240   8482972
  41.46   2885416   4671620   320
 15:50:01  7683504   8581584 52.76 0   7226132   8511932
  41.60   2998752   4958880   308
 16:00:01  7239068   9026020 55.49 0   7649948   8496764
  41.53   3032140   5358388   232
 16:10:01  7030208   9234880 56.78 0   7899512   8461588
  41.36   3108692   5492296   216

Triggering query was heavy (maybe even runaway), server load was minimal
otherwise:

 CPU %user %nice   %system   %iowait%steal
%idle
 14:30:01all  9.55  0.00  0.63  0.02  0.00
89.81

 14:40:01all  9.95  0.00  0.69  0.02  0.00
89.33

 14:50:01all 10.22  0.00  0.83  0.02  0.00
88.93

 15:00:01all 10.62  0.00  1.63  0.76  0.00
86.99

 15:10:01all  8.55  0.00  0.72  0.12  0.00
90.61

The conjecture here is that lots of idle connections make the server appear
to have less memory available than it looks, and sudden transient demands
can cause it to destabilize.

Just throwing it out there, if it can be shown to help it may be supportive
of moving forward with something like this, either instead of, or along
with, O_DIRECT or other internalized database memory management
strategies.  Lowering context switches, faster page access etc are of
course nice would not be a game changer for the workloads we see which are
pretty varied  (OLTP, analytics) although we don't extremely high
transaction rates.

merlin


Re: Let's make PostgreSQL multi-threaded

2023-07-28 Thread Matthias van de Meent
On Thu, 15 Jun 2023 at 11:07, Hannu Krosing  wrote:
>
> One more unexpected benefit of having shared caches would be easing
> access to other databases.
>
> If the system caches are there for all databases anyway, then it
> becomes much easier to make queries using objects from multiple
> databases.

We have several optimizations in our visibility code that allow us to
remove dead tuples from this database when another database still has
a connection that has an old snapshot in which the deleting
transaction of this database has not yet committed. This is allowed
because we can say with confidence that other database's connections
will never be able to see this database's tables. If we were to allow
cross-database data access, that would require cross-database snapshot
visibility checks, and that would severely hinder these optimizations.
As an example, it would increase the work we need to do for snapshots:
For the snapshot data of tables that aren't shared catalogs, we only
need to consider our own database's backends for visibility. With
cross-database visibility, we would need to consider all active
backends for all snapshots, and this can be significantly more work.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech/)




Re: Let's make PostgreSQL multi-threaded

2023-07-27 Thread David Geier

Hi,

On 6/7/23 23:37, Andres Freund wrote:

I think we're starting to hit quite a few limits related to the process model,
particularly on bigger machines. The overhead of cross-process context
switches is inherently higher than switching between threads in the same
process - and my suspicion is that that overhead will continue to
increase. Once you have a significant number of connections we end up spending
a *lot* of time in TLB misses, and that's inherent to the process model,
because you can't share the TLB across processes.


Another problem I haven't seen mentioned yet is the excessive kernel 
memory usage because every process has its own set of page table entries 
(PTEs). Without huge pages the amount of wasted memory can be huge if 
shared buffers are big.


For example with 256 GiB of used shared buffers a single process needs 
about 256 MiB for the PTEs (for simplicity I ignored the tree structure 
of the page tables and just took the number of 4k pages times 4 bytes 
per PTE). With 512 connections, which is not uncommon for machines with 
many cores, a total of 128 GiB of memory is just spent on page tables.


We used non-transparent huge pages to work around this limitation but 
they come with plenty of provisioning challenges, especially in cloud 
infrastructures where different services run next to each other on the 
same server. Transparent huge pages have unpredictable performance 
disadvantages. Also if some backends only use shared buffers sparsely, 
memory is wasted for the remaining, unused range inside the huge page.


--
David Geier
(ServiceNow)





Re: Let's make PostgreSQL multi-threaded

2023-07-19 Thread Ashutosh Bapat
I think planner would also benefit from threads. There are many tasks
in planner that are independent and can be scheduled using dependency
graph. They are too small to be parallelized through separate backends
but large enough to be performed by threads. Planning queries
involving partitioned tables take longer time (in seconds) esp. when
there are thousands of partitions. That kind of planning will get
immensely benefited by threading. Of course we can use backends which
can pull tasks from queue but sharing the PlannerInfo and its
substructure is easier through the same address space rather than
shared memory.

On Sat, Jun 10, 2023 at 5:25 AM Bruce Momjian  wrote:
>
> On Wed, Jun  7, 2023 at 06:38:38PM +0530, Ashutosh Bapat wrote:
> > With multiple processes, we can use all the available cores (at least
> > theoretically if all those processes are independent). But is that
> > guaranteed with single process multi-thread model? Google didn't throw
> > any definitive answer to that. Usually it depends upon the OS and
> > architecture.
> >
> > Maybe a good start is to start using threads instead of parallel
> > workers e.g. for parallel vacuum, parallel query and so on while
> > leaving the processes for connections and leaders. that itself might
> > take significant time. Based on that experience move to a completely
> > threaded model. Based on my experience with other similar products, I
> > think we will settle on a multi-process multi-thread model.
>
> I think we have a few known problem that we might be able to solve
> without threads, but can help us eventually move to threads if we find
> it useful:
>
> 1)  Use threads for background workers rather than processes
> 2)  Allow sessions to be stopped and started by saving their state
>
> Ideally we would solve the problem of making shared structures
> resizable, but I am not sure how that can be easily done without
> threads.
>
> --
>   Bruce Momjian  https://momjian.us
>   EDB  https://enterprisedb.com
>
>   Only you can decide what is important to you.



-- 
Best Wishes,
Ashutosh Bapat




Re: Let's make PostgreSQL multi-threaded

2023-06-15 Thread Konstantin Knizhnik




On 15.06.2023 12:04 PM, Hannu Krosing wrote:

So a fair bit of work but also a clearly defined benefits of
1) reduced memory usage
2) no need to rebuild caches for each new connection
3) no need to track PREPARE statements inside connection poolers.


Shared plan cache (not only prepared statements cache) also opens way to 
more sophisticated query optimizations.
Right now we are not performing some optimization (like constant 
expression folding) just because them increase time of processing normal 
queries.
This is why queries generated by ORMs or wizards, which can contain a 
lot of dumb stuff, are not well simplified  by Postgres.
With MS-Sql it is quite frequent that query execution time is much 
smaller than query optimization time.
Having shared plan cache allows us to spend more time in optimization 
without risk to degrade performance.






Re: Let's make PostgreSQL multi-threaded

2023-06-15 Thread Konstantin Knizhnik




On 15.06.2023 11:41 AM, James Addison wrote:

On Thu, 15 Jun 2023 at 08:12, Konstantin Knizhnik  wrote:



On 15.06.2023 1:23 AM, James Addison wrote:

On Tue, 13 Jun 2023 at 07:55, Konstantin Knizhnik  wrote:


On 12.06.2023 3:23 PM, Pavel Borisov wrote:

Is the following true or not?

1. If we switch processes to threads but leave the amount of session
local variables unchanged, there would be hardly any performance gain.
2. If we move some backend's local variables into shared memory then
the performance gain would be very near to what we get with threads
having equal amount of session-local variables.

In other words, the overall goal in principle is to gain from less
memory copying wherever it doesn't add the burden of locks for
concurrent variables access?

Regards,
Pavel Borisov,
Supabase


IMHO both statements are not true.
Switching to threads will cause less context switch overhead (because
all threads are sharing the same memory space and so preserve TLB.
How big will be this advantage? In my prototype I got ~10%. But may be
it is possible to fin workloads when it is larger.

Hi Konstantin - do you have code/links that you can share for the
prototype and benchmarks used to gather those results?



Sorry, I have already shared the link:
https://github.com/postgrespro/postgresql.pthreads/

Nope, my mistake for not locating the existing link - thank you.

Is there a reason that parser-related files (flex/bison) are added as
part of the changeset?  (I'm trying to narrow it down to only the
changes necessary for the functionality.  so far it looks mostly
fairly minimal, which is good.  the adjustments to progname are
another thing that look a bit unusual/maybe unnecessary for the
feature)


Sorry, absolutely no reason - just my fault.


As you can see last commit was 6 years ago when I stopped work on this project.
Why?  I already tried to explain it:
- benefits from switching to threads were not so large. May be I just failed to 
fid proper workload, but is was more or less expected result,
because most of the code was not changed - it uses the same sync primitives, 
the same local catalog/relation caches,..
To take all advantage of multithreadig model it is necessary to rewrite many 
components, especially related with interprocess communication.
But maintaining such fork of Postgres and synchronize it with mainstream 
requires too much efforts and I was not able to do it myself.

I get the feeling that there are probably certain query types or
patterns where a significant, order-of-magnitude speedup is possible
with threads - but yep, I haven't seen those described in detail yet
on the mailing list (but as hinted by my not noticing the github link
previously, maybe I'm not following the list closely enough).

What workloads did you try with your version of the project?


I do not remember now precisely (6 years passed).
But definitely I tried pgbench, especially read-only pgbench (to be more 
CPU rather than disk bounded)







Re: Let's make PostgreSQL multi-threaded

2023-06-15 Thread Hannu Krosing
One more unexpected benefit of having shared caches would be easing
access to other databases.

If the system caches are there for all databases anyway, then it
becomes much easier to make queries using objects from multiple
databases.

Note that this does not strictly need threads, just shared caches.

On Thu, Jun 15, 2023 at 11:04 AM Hannu Krosing  wrote:
>
> On Thu, Jun 15, 2023 at 10:41 AM James Addison  wrote:
> >
> > This is making me wonder about other performance/scalability areas
> > that might not have been considered due to focus on the details of the
> > existing codebase, but I'll save that for another thread and will try
> > to learn more first.
>
> A gradual move to more shared structures seems to be a way forward
>
> It should get us all the benefits of threading minus the need for TLB
> reloading and (in some cases) reduction of per-process virtual memory
> mapping tables.
>
> In any case we would need to implement all the locking and parallelism
> management of these shared structures that are not there in the
> current process architecture.
>
> So a fair bit of work but also a clearly defined benefits of
> 1) reduced memory usage
> 2) no need to rebuild caches for each new connection
> 3) no need to track PREPARE statements inside connection poolers.
>
> There can be extra complexity when different connections use the same
> prepared statement name (say "PREP001") for different queries.
> For this wel likely will need a good cooperation with connection
> pooler where it passes some kind of client connection id along at the
> transaction start




Re: Let's make PostgreSQL multi-threaded

2023-06-15 Thread Hannu Krosing
On Thu, Jun 15, 2023 at 10:41 AM James Addison  wrote:
>
> This is making me wonder about other performance/scalability areas
> that might not have been considered due to focus on the details of the
> existing codebase, but I'll save that for another thread and will try
> to learn more first.

A gradual move to more shared structures seems to be a way forward

It should get us all the benefits of threading minus the need for TLB
reloading and (in some cases) reduction of per-process virtual memory
mapping tables.

In any case we would need to implement all the locking and parallelism
management of these shared structures that are not there in the
current process architecture.

So a fair bit of work but also a clearly defined benefits of
1) reduced memory usage
2) no need to rebuild caches for each new connection
3) no need to track PREPARE statements inside connection poolers.

There can be extra complexity when different connections use the same
prepared statement name (say "PREP001") for different queries.
For this wel likely will need a good cooperation with connection
pooler where it passes some kind of client connection id along at the
transaction start




Re: Let's make PostgreSQL multi-threaded

2023-06-15 Thread Hannu Krosing
On Thu, Jun 15, 2023 at 9:12 AM Konstantin Knizhnik  wrote:

> There are three different but related directions of improving current 
> Postgres:
> 1. Replacing processes with threads

Here we could likely start with making parallel query multi-threaded.

This would also remove the big blocker for parallelizing things like
CREATE TABLE AS SELECT ... where we are currently held bac by the
restriction that only the leader process can write.

> 2. Builtin connection pooler

Would be definitely a nice thing to have. And we could even start by
integrating a non-threaded pooler like pgbouncer to run as a
postgresql worker process (or two).

> 3. Lightweight backends (shared catalog/relation/prepared statements caches)

Shared prepared statement caches (of course have to be per-user and
per-database) would give additional benefit of lightweight connection
poolers not needing to track these. Currently the missing support of
named prepared statements is one of the main hindrances of using
pgbouncer with JDBC in transaction pooling mode (you can use it, but
have to turn off automatic statement preparing)

>
> The motivation for such changes are also similar:
> 1. Increase Postgres scalability
> 2. Reduce memory consumption
> 3. Make Postgres better fit cloud and serverless requirements

The memory consumption reduction would be a big and clear win for many
workloads.

Also just moving more things in shared memory will also prepare us for
move to threaded server (if it will eventually happen)

> I am not sure now which one should be addressed first or them can be done 
> together.

Shared caches seem like a guaranteed win at least on memory usage.
There could be performance  (and complexity) downsides for specific
workloads, but they would be the same as for the threaded model, so
would also be a good learning opportunity.

> Replacing static variables with thread-local is the first and may be the 
> easiest step.

I think we got our first patch doing this (as part of patches for
running PG threaded on Solaris) quite early in the OSS development ,
could have been even in the last century :)

> It requires more or less mechanical changes. More challenging thing is 
> replacing private per-backend data structures
> with shared ones (caches, file descriptors,...)

Indeed, sharing caches would be also part of the work that is needed
for the sharded model, so anyone feeling strongly about moving to
threads could start with this :)

---
Hannu




Re: Let's make PostgreSQL multi-threaded

2023-06-15 Thread James Addison
On Thu, 15 Jun 2023 at 08:12, Konstantin Knizhnik  wrote:
>
>
>
> On 15.06.2023 1:23 AM, James Addison wrote:
>
> On Tue, 13 Jun 2023 at 07:55, Konstantin Knizhnik  wrote:
>
>
> On 12.06.2023 3:23 PM, Pavel Borisov wrote:
>
> Is the following true or not?
>
> 1. If we switch processes to threads but leave the amount of session
> local variables unchanged, there would be hardly any performance gain.
> 2. If we move some backend's local variables into shared memory then
> the performance gain would be very near to what we get with threads
> having equal amount of session-local variables.
>
> In other words, the overall goal in principle is to gain from less
> memory copying wherever it doesn't add the burden of locks for
> concurrent variables access?
>
> Regards,
> Pavel Borisov,
> Supabase
>
>
> IMHO both statements are not true.
> Switching to threads will cause less context switch overhead (because
> all threads are sharing the same memory space and so preserve TLB.
> How big will be this advantage? In my prototype I got ~10%. But may be
> it is possible to fin workloads when it is larger.
>
> Hi Konstantin - do you have code/links that you can share for the
> prototype and benchmarks used to gather those results?
>
>
>
> Sorry, I have already shared the link:
> https://github.com/postgrespro/postgresql.pthreads/

Nope, my mistake for not locating the existing link - thank you.

Is there a reason that parser-related files (flex/bison) are added as
part of the changeset?  (I'm trying to narrow it down to only the
changes necessary for the functionality.  so far it looks mostly
fairly minimal, which is good.  the adjustments to progname are
another thing that look a bit unusual/maybe unnecessary for the
feature)

> As you can see last commit was 6 years ago when I stopped work on this 
> project.
> Why?  I already tried to explain it:
> - benefits from switching to threads were not so large. May be I just failed 
> to fid proper workload, but is was more or less expected result,
> because most of the code was not changed - it uses the same sync primitives, 
> the same local catalog/relation caches,..
> To take all advantage of multithreadig model it is necessary to rewrite many 
> components, especially related with interprocess communication.
> But maintaining such fork of Postgres and synchronize it with mainstream 
> requires too much efforts and I was not able to do it myself.

I get the feeling that there are probably certain query types or
patterns where a significant, order-of-magnitude speedup is possible
with threads - but yep, I haven't seen those described in detail yet
on the mailing list (but as hinted by my not noticing the github link
previously, maybe I'm not following the list closely enough).

What workloads did you try with your version of the project?

> There are three different but related directions of improving current 
> Postgres:
> 1. Replacing processes with threads
> 2. Builtin connection pooler
> 3. Lightweight backends (shared catalog/relation/prepared statements caches)
>
> The motivation for such changes are also similar:
> 1. Increase Postgres scalability
> 2. Reduce memory consumption
> 3. Make Postgres better fir cloud and serverless requirements
>
> I am not sure now which one should be addressed first or them can be done 
> together.
>
> Replacing static variables with thread-local is the first and may be the 
> easiest step.
> It requires more or less mechanical changes. More challenging thing is 
> replacing private per-backend data structures
> with shared ones (caches, file descriptors,...)

Thank you.  Personally I think that motivation two (reducing memory
consumption) -- as long as it can be done without detrimentally
affecting functionality or correctness, and without making the code
harder to develop/understand -- could provide benefits for all three
of the motivating cases (and, in fact, for non-cloud/serverful use
cases too).

This is making me wonder about other performance/scalability areas
that might not have been considered due to focus on the details of the
existing codebase, but I'll save that for another thread and will try
to learn more first.




Re: Let's make PostgreSQL multi-threaded

2023-06-15 Thread Konstantin Knizhnik



On 15.06.2023 1:23 AM, James Addison wrote:

On Tue, 13 Jun 2023 at 07:55, Konstantin Knizhnik  wrote:



On 12.06.2023 3:23 PM, Pavel Borisov wrote:

Is the following true or not?

1. If we switch processes to threads but leave the amount of session
local variables unchanged, there would be hardly any performance gain.
2. If we move some backend's local variables into shared memory then
the performance gain would be very near to what we get with threads
having equal amount of session-local variables.

In other words, the overall goal in principle is to gain from less
memory copying wherever it doesn't add the burden of locks for
concurrent variables access?

Regards,
Pavel Borisov,
Supabase



IMHO both statements are not true.
Switching to threads will cause less context switch overhead (because
all threads are sharing the same memory space and so preserve TLB.
How big will be this advantage? In my prototype I got ~10%. But may be
it is possible to fin workloads when it is larger.

Hi Konstantin - do you have code/links that you can share for the
prototype and benchmarks used to gather those results?



Sorry, I have already shared the link:
https://github.com/postgrespro/postgresql.pthreads/

As you can see last commit was 6 years ago when I stopped work on this 
project.

Why?  I already tried to explain it:
- benefits from switching to threads were not so large. May be I just 
failed to fid proper workload, but is was more or less expected result,
because most of the code was not changed - it uses the same sync 
primitives, the same local catalog/relation caches,..
To take all advantage of multithreadig model it is necessary to rewrite 
many components, especially related with interprocess communication.
But maintaining such fork of Postgres and synchronize it with mainstream 
requires too much efforts and I was not able to do it myself.


There are three different but related directions of improving current 
Postgres:

1. Replacing processes with threads
2. Builtin connection pooler
3. Lightweight backends (shared catalog/relation/prepared statements caches)

The motivation for such changes are also similar:
1. Increase Postgres scalability
2. Reduce memory consumption
3. Make Postgres better fir cloud and serverless requirements

I am not sure now which one should be addressed first or them can be 
done together.


Replacing static variables with thread-local is the first and may be the 
easiest step.
It requires more or less mechanical changes. More challenging thing is 
replacing private per-backend data structures

with shared ones (caches, file descriptors,...)


Re: Let's make PostgreSQL multi-threaded

2023-06-14 Thread James Addison
On Tue, 13 Jun 2023 at 07:55, Konstantin Knizhnik  wrote:
>
>
>
> On 12.06.2023 3:23 PM, Pavel Borisov wrote:
> > Is the following true or not?
> >
> > 1. If we switch processes to threads but leave the amount of session
> > local variables unchanged, there would be hardly any performance gain.
> > 2. If we move some backend's local variables into shared memory then
> > the performance gain would be very near to what we get with threads
> > having equal amount of session-local variables.
> >
> > In other words, the overall goal in principle is to gain from less
> > memory copying wherever it doesn't add the burden of locks for
> > concurrent variables access?
> >
> > Regards,
> > Pavel Borisov,
> > Supabase
> >
> >
> IMHO both statements are not true.
> Switching to threads will cause less context switch overhead (because
> all threads are sharing the same memory space and so preserve TLB.
> How big will be this advantage? In my prototype I got ~10%. But may be
> it is possible to fin workloads when it is larger.

Hi Konstantin - do you have code/links that you can share for the
prototype and benchmarks used to gather those results?




Re: Let's make PostgreSQL multi-threaded

2023-06-14 Thread James Addison
On Wed, 14 Jun 2023 at 20:48, Robert Haas  wrote:
>
> On Wed, Jun 14, 2023 at 3:16 PM James Addison  wrote:
> > I think that they're practical performance-related questions about the
> > benefits of performing a technical migration that could involve
> > significant development time, take years to complete, and uncover
> > problems that cause reliability issues for a stable, proven database
> > management system.
>
> I don't. I think they're reflecting confusion about what the actual,
> practical path forward is.

Ok.  My concern is that the balance between the downstream ecosystem
impact (people and processes that use PIDs to identify, monitor and
manage query and background processes, for example) compared to the
benefits (performance improvement for some -- but what kind of? --
workloads) seems unclear, and if it's unclear, it's less likely to be
compelling.

Pavel's message and questions seem to poke at some of the potential
limitations of the performance improvements, and Andres' response
mentions reduced complexity and reduced context-switching.  Elsewhere
I also see that TLB (translation lookaside buffer?) lookups in
particular should see improvements.  Those are good, but somewhat
unquantified.

The benefits are less of an immediate concern if there's going to be a
migration/transition phase where both the process model and the thread
model are available.  But again, if the benefits of the threading
model aren't clear, people are unlikely to want to switch, and I don't
think that the cost for people and systems to migrate from tooling and
methods built around processes will be zero.  That could lead to a bad
outcome, where the codebase includes both models and yet is unable to
plan to simplify to one.




Re: Let's make PostgreSQL multi-threaded

2023-06-14 Thread Robert Haas
On Wed, Jun 14, 2023 at 3:46 PM Hannu Krosing  wrote:
> I remember a few times when memory leaks in some PostGIS packages
> cause slow memory exhaustion and the simple fix was limiting
> connection lifetime to something between 15 min and an hour.
>
> The main problem here is that PostGIS uses a few tens of other GPL GIS
> related packages which are all changing independently and thus it is
> quite hard to be sure that none of these have developed a leak. And
> you also likely can not just stop upgrading these as they also contain
> security fixes.
>
> I have no idea what the fix could be in case of threaded server.

Presumably, when a thread exits, we
MemoryContextDelete(TopMemoryContext). If the leak is into any memory
context managed by PostgreSQL, this still frees the memory. But it
might not be. Right now, if a library does a malloc() that it doesn't
free() every once in a while, it's no big deal. If it does it too
often, it's a problem now, too. But if it does it only every now and
then, process exit will prevent accumulation over time. In a threaded
model, that isn't true any longer: those allocations will accumulate
until we OOM.

And IMHO that's definitely a very significant downside of this
direction. I don't think it should be dispositive because such
problems are, hopefully, fixable, whereas some of the problems caused
by the process model are basically unfixable except by not using it
any more. However, if we lived in a world where both models were
supported and a particular user said, "hey, I'm sticking with the
process model because I don't trust my third-party libraries not to
leak," I would be like "yep, I totally get it."

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Let's make PostgreSQL multi-threaded

2023-06-14 Thread Andres Freund
Hi,

On 2023-06-13 16:55:12 +0900, Kyotaro Horiguchi wrote:
> At Tue, 13 Jun 2023 09:55:36 +0300, Konstantin Knizhnik  
> wrote in 
> > Postgres backend is "thick" not because of large number of local
> > variables.
> > It is because of local caches: catalog cache, relation cache, prepared
> > statements cache,...
> > If they are not rewritten, then backend still may consume a lot of
> > memory even if it will be thread rather then process.
> > But threads simplify development of global caches, although it can be
> > done with DSM.
> 
> With the process model, that local stuff are flushed out upon
> reconnection. If we switch to the thread model, we will need an
> expiration mechanism for those stuff.

Isn't that just doing something like MemoryContextDelete(TopMemoryContext) at
the end of proc_exit() (or it's thread equivalent)?

Greetings,

Andres Freund




Re: Let's make PostgreSQL multi-threaded

2023-06-14 Thread Robert Haas
On Wed, Jun 14, 2023 at 3:16 PM James Addison  wrote:
> I think that they're practical performance-related questions about the
> benefits of performing a technical migration that could involve
> significant development time, take years to complete, and uncover
> problems that cause reliability issues for a stable, proven database
> management system.

I don't. I think they're reflecting confusion about what the actual,
practical path forward is.

For a first cut at this, all of our global variables become
thread-local. Every single last one of them. So there's no savings of
the type described in that email. We do each and every thing just as
we do it today, except that it's all in different parts of a single
address space instead of different address spaces with a chunk of
shared memory mapped into each one. Syscaches don't change, catcaches
don't change, memory copying is not reduced, literally nothing
changes. The coding model is just as it is today. Except for
decorating global variables, virtually no backend code needs to notice
or care about the transition. There are a few exceptions. For
instance, TopMemoryContext would need to be deleted explicitly, and
the FD caching stuff would have to be revised, because it uses up all
the FDs that the process can open, and having many threads doing that
in a single process isn't going to work. There's probably some other
things that I'm forgetting, but the typical effect on the average bit
of backend code should be very, very low. If it isn't, we're doing it
wrong.

So, I think saying "oh, this is going to destabliize PostgreSQL for
years" is just fear-mongering. If someone proposes a patch that we
think is going to have that effect, we should (and certainly will)
reject it. But I see no reason why we can't have a good patch for this
where most code changes only in mechanical ways that are easy to
validate.

> This reads like a code quality argument: that's worthwhile, but I
> don't see how it supports your 'False' assertions.  Do two queries
> running in separate processes spend much time allocating and waiting
> on resources that could be shared within a single thread?

I don't have any idea what this has to do with what Andres was talking
about, honestly. However, there certainly are cases of the thing
you're talking about here. Having many backends separately open the
same file means we've got a whole bunch of different file descriptors
accessing the same file instead of just one. That does have a
meaningful cost on some workloads. Passing tuples between cooperating
processes that are jointly executing a parallel query is costly in the
current scheme, too. There might be ways to improve on that somewhat
even without threads, but if you don't think that the process model
made getting parallel query working harder and less efficient, I'm
here as the guy who wrote a lot of that code to tell you that it very
much did.

> That seems valid.  Even so, I would expect that for many queries, I/O
> access and row processing time is the bulk of the work, and that
> context-switches to/from other query processes is relatively
> negligible.

That's completely true, but there are ALSO many OTHER situations in
which the overhead of frequent context switching is absolutely
crushing. You might as well argue that umbrellas don't need to exist
because there are lots of sunny days.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Let's make PostgreSQL multi-threaded

2023-06-14 Thread Hannu Krosing
On Tue, Jun 13, 2023 at 9:55 AM Kyotaro Horiguchi
 wrote:
>
> At Tue, 13 Jun 2023 09:55:36 +0300, Konstantin Knizhnik  
> wrote in
> > Postgres backend is "thick" not because of large number of local
> > variables.
> > It is because of local caches: catalog cache, relation cache, prepared
> > statements cache,...
> > If they are not rewritten, then backend still may consume a lot of
> > memory even if it will be thread rather then process.
> > But threads simplify development of global caches, although it can be
> > done with DSM.
>
> With the process model, that local stuff are flushed out upon
> reconnection. If we switch to the thread model, we will need an
> expiration mechanism for those stuff.

The part that can not be so easily solved is that "the local stuff"
can include some leakage that is not directly controlled by us.

I remember a few times when memory leaks in some PostGIS packages
cause slow memory exhaustion and the simple fix was limiting
connection lifetime to something between 15 min and an hour.

The main problem here is that PostGIS uses a few tens of other GPL GIS
related packages which are all changing independently and thus it is
quite hard to be sure that none of these have developed a leak. And
you also likely can not just stop upgrading these as they also contain
security fixes.

I have no idea what the fix could be in case of threaded server.




Re: Let's make PostgreSQL multi-threaded

2023-06-14 Thread James Addison
On Mon, 12 Jun 2023 at 20:24, Andres Freund  wrote:
>
> Hi,
>
> On 2023-06-12 16:23:14 +0400, Pavel Borisov wrote:
> > Is the following true or not?
> >
> > 1. If we switch processes to threads but leave the amount of session
> > local variables unchanged, there would be hardly any performance gain.
>
> False.
>
>
> > 2. If we move some backend's local variables into shared memory then
> > the performance gain would be very near to what we get with threads
> > having equal amount of session-local variables.
>
> False.
>
>
> > In other words, the overall goal in principle is to gain from less
> > memory copying wherever it doesn't add the burden of locks for
> > concurrent variables access?
>
> False.
>
> Those points seems pretty much unrelated to the potential gains from switching
> to a threading model. The main advantages are:

I think that they're practical performance-related questions about the
benefits of performing a technical migration that could involve
significant development time, take years to complete, and uncover
problems that cause reliability issues for a stable, proven database
management system.

> 1) We'd gain from being able to share state more efficiently (using normal
>pointers) and more dynamically (not needing to pre-allocate). That'd remove
>a good amount of complexity. As an example, consider the work we need to do
>to ferry tuples from one process to another. Even if we just continue to
>use shm_mq, in a threading world we could just put a pointer in the queue,
>but have the tuple data be shared between the processes etc.
>
>Eventually this could include removing the 1:1 connection<->process/thread
>model. That's possible to do with processes as well, but considerably
>harder.

This reads like a code quality argument: that's worthwhile, but I
don't see how it supports your 'False' assertions.  Do two queries
running in separate processes spend much time allocating and waiting
on resources that could be shared within a single thread?

> 2) Making context switches cheaper / sharing more resources at the OS and
>hardware level.

That seems valid.  Even so, I would expect that for many queries, I/O
access and row processing time is the bulk of the work, and that
context-switches to/from other query processes is relatively
negligible.




Re: Let's make PostgreSQL multi-threaded

2023-06-14 Thread Andreas Karlsson

On 6/14/23 09:01, Kyotaro Horiguchi wrote:

At Wed, 14 Jun 2023 08:46:05 +0300, Konstantin Knizhnik  
wrote in

But I do not think that it is somehow related with using threads
instead of process.
The question whether to use private or shared cache is not directly
related to threads vs. process choice.


Yeah, I unconsciously conflated the two things. We can use per-thread
cache on multithreading.


For sure, and we can drop the cache when dropping the memory context. 
And in the first versions of an imagined threaded PostgreSQL I am sure 
that is how things will work.


Then later someone will have to investigate which caches are worth 
making shared and what the eviction/expiration strategy should be.


Andreas




Re: Let's make PostgreSQL multi-threaded

2023-06-14 Thread Kyotaro Horiguchi
At Wed, 14 Jun 2023 08:46:05 +0300, Konstantin Knizhnik  
wrote in 
> But I do not think that it is somehow related with using threads
> instead of process.
> The question whether to use private or shared cache is not directly
> related to threads vs. process choice.

Yeah, I unconsciously conflated the two things. We can use per-thread
cache on multithreading.

> Yes, threads makes implementation of shared cache much easier. But it
> can be also done using dynamic
> memory segments, Definitely shared cache has its pros and cons, first
> if all it requires sycnhronization
> which may have negative impact o performance.

True.

> I have made an attempt to combine both caches: use relatively small
> per-backend local cache
> and large shared cache.
> I wonder what people think about the idea to make backends less thick
> by using shared cache.

I remember of a relatively old thread about that.

https://www.postgresql.org/message-id/4E72940DA2BF16479384A86D54D0988A567B9245%40G01JPEXMBKW04

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Let's make PostgreSQL multi-threaded

2023-06-13 Thread Konstantin Knizhnik




On 13.06.2023 11:46 AM, Kyotaro Horiguchi wrote:
So we can assume that catalog  and relation cache should always fit in 
memory

memory (otherwise significant rewriting of all Postgtres code working
with relations will be needed).

I'm not sure that is ture.. But likely to be?


Sorry, looks like I was wrong.
Right now access to sys/cat/rel caches is protected by reference counter.
So we can easily add some replacement algorithm for this caches.


I don't think it is efficient that PostgreSQL to consume a large
amount of memory for seldom-used content. While we may not need
expiration mechanism for moderate use cases, I have observed instances
where a single process hogs a significant amount of memory,
particularly for intermittent tasks.


Usually system catalog is small enough and do not cause any problems 
with memory consumption.

But partitioned and temporary tables can cause bloat of catalog.
In such cases some eviction mechanism will be really useful.
But I do not think that it is somehow related with using threads instead 
of process.
The question whether to use private or shared cache is not directly 
related to threads vs. process choice.
Yes, threads makes implementation of shared cache much easier. But it 
can be also done using dynamic
memory segments, Definitely shared cache has its pros and cons, first if 
all it requires sycnhronization

which may have negative impact o performance.

I have made an attempt to combine both caches: use relatively small 
per-backend local cache

and large shared cache.
I wonder what people think about the idea to make backends less thick by 
using shared cache.






Re: Let's make PostgreSQL multi-threaded

2023-06-13 Thread Andreas Karlsson

On 6/13/23 10:20, Konstantin Knizhnik wrote:
The fact that it is flushed out upon reconnection can not 
help much: what if backends are not going to disconnect?


This is why many connection pools have a maximum connection lifetime 
which can be configured. So in practice flushing all caches on 
disconnect helps a lot.


The nice proper solution might very well be adding a maximum cache sizes 
and replacement but it obviously makes the cache more complex and adds 
an new GUC. Probably worth it, but flushing caches on disconnect is a 
simple solution which works well in practice for many but no all workloads.


Andreas





Re: Let's make PostgreSQL multi-threaded

2023-06-13 Thread Kyotaro Horiguchi
At Tue, 13 Jun 2023 11:20:56 +0300, Konstantin Knizhnik  
wrote in 
> 
> 
> On 13.06.2023 10:55 AM, Kyotaro Horiguchi wrote:
> > At Tue, 13 Jun 2023 09:55:36 +0300, Konstantin Knizhnik
> >  wrote in
> >> Postgres backend is "thick" not because of large number of local
> >> variables.
> >> It is because of local caches: catalog cache, relation cache, prepared
> >> statements cache,...
> >> If they are not rewritten, then backend still may consume a lot of
> >> memory even if it will be thread rather then process.
> >> But threads simplify development of global caches, although it can be
> >> done with DSM.
> > With the process model, that local stuff are flushed out upon
> > reconnection. If we switch to the thread model, we will need an
> > expiration mechanism for those stuff.
> 
> We already have invalidation mechanism. It will be also used in case
> of shared cache, but we do not need to send invalidations to all
> backends.

Invalidation is not expiration.

> I do not completely understand your point.
> Right now caches (for example catalog cache) is not limited at all.
> So if you have very large database schema, then this cache will
> consume a lot of memory (multiplied by number of
> backends). The fact that it is flushed out upon reconnection can not
> help much: what if backends are not going to disconnect?

Right now, if one out of many backends creates a huge system catalog
cahce, it can be cleard upon disconnection.  The same client can
repeat this process, but users can ensure such situations don't
persist. However, with the thread model, we won't be able to clear
parts of the cache that aren't required by the active backends
anymore. (Of course with threads, we can avoid duplications, though.)

> In case of shared cache we will have to address the same problem:
> whether this cache should be limited (with some replacement discipline
> as LRU).
> Or it is unlimited. In case of shared cache, size of the cache is less
> critical because it is not multiplied by number of backends.

Yes.

> So we can assume that catalog  and relation cache should always fir in
> memory (otherwise significant rewriting of all Postgtres code working
> with relations will be needed).

I'm not sure that is ture.. But likely to be?

> But Postgres also have temporary tables. For them we may need local
> backend cache in any case.
> Global temp table patch was not approved so we still have to deal with
> this awful temp tables.
> 
> In any case I do not understand why do we need some expiration
> mechanism for this caches.

I don't think it is efficient that PostgreSQL to consume a large
amount of memory for seldom-used content. While we may not need
expiration mechanism for moderate use cases, I have observed instances
where a single process hogs a significant amount of memory,
particularly for intermittent tasks.

> If there is some relation than information about this relation should
> be kept in the cache as long as this relation is alive.
> If there is not enough memory to cache information about all
> relations, then we may need some replacement algorithm.
> But I do not think that there is any sense to remove some item fro the
> cache just because it is too old.

Ah. I see. I am fine with a replacement mechanishm. But the evicition
algorithm seems almost identical to the exparation algorithm. The
algorithm will not be simply driven by object age, but I'm not sure we
need more than access frequency.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Let's make PostgreSQL multi-threaded

2023-06-13 Thread Konstantin Knizhnik




On 13.06.2023 10:55 AM, Kyotaro Horiguchi wrote:

At Tue, 13 Jun 2023 09:55:36 +0300, Konstantin Knizhnik  
wrote in

Postgres backend is "thick" not because of large number of local
variables.
It is because of local caches: catalog cache, relation cache, prepared
statements cache,...
If they are not rewritten, then backend still may consume a lot of
memory even if it will be thread rather then process.
But threads simplify development of global caches, although it can be
done with DSM.

With the process model, that local stuff are flushed out upon
reconnection. If we switch to the thread model, we will need an
expiration mechanism for those stuff.


We already have invalidation mechanism. It will be also used in case of 
shared cache, but we do not need to send invalidations to all backends.

I do not completely understand your point.
Right now caches (for example catalog cache) is not limited at all.
So if you have very large database schema, then this cache will consume 
a lot of memory (multiplied by number of
backends). The fact that it is flushed out upon reconnection can not 
help much: what if backends are not going to disconnect?


In case of shared cache we will have to address the same problem: 
whether this cache should be limited (with some replacement discipline 
as LRU).
Or it is unlimited. In case of shared cache, size of the cache is less 
critical because it is not multiplied by number of backends.
So we can assume that catalog  and relation cache should always fir in 
memory (otherwise significant rewriting of all Postgtres code working 
with relations will be needed).


But Postgres also have temporary tables. For them we may need local 
backend cache in any case.
Global temp table patch was not approved so we still have to deal with 
this awful temp tables.


In any case I do not understand why do we need some expiration mechanism 
for this caches.
If there is some relation than information about this relation should be 
kept in the cache as long as this relation is alive.
If there is not enough memory to cache information about all relations, 
then we may need some replacement algorithm.
But I do not think that there is any sense to remove some item fro the 
cache just because it is too old.





Re: Let's make PostgreSQL multi-threaded

2023-06-13 Thread Kyotaro Horiguchi
At Tue, 13 Jun 2023 09:55:36 +0300, Konstantin Knizhnik  
wrote in 
> Postgres backend is "thick" not because of large number of local
> variables.
> It is because of local caches: catalog cache, relation cache, prepared
> statements cache,...
> If they are not rewritten, then backend still may consume a lot of
> memory even if it will be thread rather then process.
> But threads simplify development of global caches, although it can be
> done with DSM.

With the process model, that local stuff are flushed out upon
reconnection. If we switch to the thread model, we will need an
expiration mechanism for those stuff.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Let's make PostgreSQL multi-threaded

2023-06-13 Thread Konstantin Knizhnik




On 12.06.2023 3:23 PM, Pavel Borisov wrote:

Is the following true or not?

1. If we switch processes to threads but leave the amount of session
local variables unchanged, there would be hardly any performance gain.
2. If we move some backend's local variables into shared memory then
the performance gain would be very near to what we get with threads
having equal amount of session-local variables.

In other words, the overall goal in principle is to gain from less
memory copying wherever it doesn't add the burden of locks for
concurrent variables access?

Regards,
Pavel Borisov,
Supabase



IMHO both statements are not true.
Switching to threads will cause less context switch overhead (because 
all threads are sharing the same memory space and so preserve TLB.
How big will be this advantage? In my prototype I got ~10%. But may be 
it is possible to fin workloads when it is larger.


Postgres backend is "thick" not because of large number of local variables.
It is because of local caches: catalog cache, relation cache, prepared 
statements cache,...
If they are not rewritten, then backend still may consume a lot of 
memory even if it will be thread rather then process.
But threads simplify development of global caches, although it can be 
done with DSM.






Re: Let's make PostgreSQL multi-threaded

2023-06-12 Thread Michael Paquier
On Mon, Jun 12, 2023 at 12:24:30PM -0700, Andres Freund wrote:
> Those points seems pretty much unrelated to the potential gains from switching
> to a threading model. The main advantages are:
> 
> 1) We'd gain from being able to share state more efficiently (using normal
>pointers) and more dynamically (not needing to pre-allocate). That'd remove
>a good amount of complexity. As an example, consider the work we need to do
>to ferry tuples from one process to another. Even if we just continue to
>use shm_mq, in a threading world we could just put a pointer in the queue,
>but have the tuple data be shared between the processes etc.
> 
>Eventually this could include removing the 1:1 connection<->process/thread
>model. That's possible to do with processes as well, but considerably
>harder.
> 
> 2) Making context switches cheaper / sharing more resources at the OS and
>hardware level.

Yes.  FWIW, while reading the thread, parallel workers stroke me as
the first area that would benefit from all that.  Could it be easier
to figure out the incremental pieces if working on a new node doing a
Gather based on threads, for instance?
--
Michael


signature.asc
Description: PGP signature


Re: Let's make PostgreSQL multi-threaded

2023-06-12 Thread Heikki Linnakangas

On 10/06/2023 21:01, Hannu Krosing wrote:

On Mon, Jun 5, 2023 at 4:52 PM Heikki Linnakangas  wrote:


If there are no major objections, I'm going to update the developer FAQ,
removing the excuses there for why we don't use threads [1].


I think it is not wise to start the wholesale removal of the objections there.

But I think it is worthwhile to revisit the section about threads and
maybe split out the historic part which is no more true, and provide
both pros and cons for these.



I started with this short summary from the discussion in this thread,
feel free to expand, argue, fix :)
* is current excuse
-- is counterargument or ack


Thanks, that's a good idea.


* Speed improvements using threads are small compared to the remaining
backend startup time.
-- we now have some measurements that show significant performance
improvements not related to startup time


Also, I don't expect much performance gain directly from switching to 
threads. The point is that switching to a multi-threaded model makes 
possible, or at least greatly simplifies, a lot of other development. 
Which can then help with the backend startup time, among other things. 
For example, a shared catalog cache.



* The backend code would be more complex.
-- this is still the case


I don't quite buy that. A multi-threaded model isn't inherently more 
complex than a multi-process model. Just different. Sure, the transition 
period will be more complex, when we need to support both models. But in 
the long run, if we can remove the multi-process mode, we can make a lot 
of things *simpler*.



-- even more worrisome is that all extensions also need to be rewritten


"rewritten" is an exaggeration. Yes, extensions will need adapt, similar 
to the core code. But I hope it will be pretty mechanical work, marking 
global variables as thread-local and such. Many extensions will work 
with little to no changes.



-- and many incompatibilities will be silent and take potentially years to find


IMO this is the most scary part of all this. I'm optimistic that we can 
have enough compiler support and tooling to catch most issues. But we 
don't know for sure at this point.



* Terminating backend processes allows the OS to cleanly and quickly
free all resources, protecting against memory and file descriptor
leaks and making backend shutdown cheaper and faster
-- still true


Yep. I'm not too worried about PostgreSQL code, our memory contexts and 
resource owners are very good at stopping leaks. But 3rd party libraries 
could pose hard problems. IIRC we still have a leak with the LLVM JIT 
code, for example. We should fix that anyway, of course, but the 
multi-process model is more forgiving with leaks like that.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Let's make PostgreSQL multi-threaded

2023-06-12 Thread Andres Freund
Hi,

On 2023-06-12 16:23:14 +0400, Pavel Borisov wrote:
> Is the following true or not?
>
> 1. If we switch processes to threads but leave the amount of session
> local variables unchanged, there would be hardly any performance gain.

False.


> 2. If we move some backend's local variables into shared memory then
> the performance gain would be very near to what we get with threads
> having equal amount of session-local variables.

False.


> In other words, the overall goal in principle is to gain from less
> memory copying wherever it doesn't add the burden of locks for
> concurrent variables access?

False.

Those points seems pretty much unrelated to the potential gains from switching
to a threading model. The main advantages are:

1) We'd gain from being able to share state more efficiently (using normal
   pointers) and more dynamically (not needing to pre-allocate). That'd remove
   a good amount of complexity. As an example, consider the work we need to do
   to ferry tuples from one process to another. Even if we just continue to
   use shm_mq, in a threading world we could just put a pointer in the queue,
   but have the tuple data be shared between the processes etc.

   Eventually this could include removing the 1:1 connection<->process/thread
   model. That's possible to do with processes as well, but considerably
   harder.

2) Making context switches cheaper / sharing more resources at the OS and
   hardware level.

Greetings,

Andres Freund




Re: Let's make PostgreSQL multi-threaded

2023-06-12 Thread Pavel Borisov
Is the following true or not?

1. If we switch processes to threads but leave the amount of session
local variables unchanged, there would be hardly any performance gain.
2. If we move some backend's local variables into shared memory then
the performance gain would be very near to what we get with threads
having equal amount of session-local variables.

In other words, the overall goal in principle is to gain from less
memory copying wherever it doesn't add the burden of locks for
concurrent variables access?

Regards,
Pavel Borisov,
Supabase




Re: Let's make PostgreSQL multi-threaded

2023-06-12 Thread Joel Jacobson
On Mon, Jun 12, 2023, at 13:53, Tomas Vondra wrote:
> In a way, I think this "split into independently beneficial steps"
> strategy is the only option with a meaningful chance of success.

+1

/Joel




Re: Let's make PostgreSQL multi-threaded

2023-06-12 Thread Tomas Vondra



On 6/10/23 13:20, Dave Cramer wrote:
> 
> 
> On Fri, 9 Jun 2023 at 18:29, Stephen Frost  > wrote:
> 
> Greetings,
> 
> * Dave Cramer (davecramer@postgres.rocks) wrote:
> > One thing I can think of is upgrading. AFAIK dump and restore is
> the only
> > way to change the on disk format.
> > Presuming that eventually we will be forced to change the on disk
> format it
> > would be nice to be able to do so in a manner which does not force
> long
> > down times
> 
> There is an ongoing effort moving in this direction.  The $subject isn't
> great, but this patch set (which we are currently working on
> updating...): https://commitfest.postgresql.org/43/3986/
>  attempts
> changing a lot of currently compile-time block-size pieces to be
> run-time which would open up the possibility to have a different page
> format for, eg, different tablespaces.  Possibly even different block
> sizes.  We'd certainly welcome discussion from others who are
> interested.
> 
> Thanks,
> 
> Stephen
> 
> 
> Upgrading was just one example of difficult problems that need to be 
> addressed. My thought was that before we commit to something as
> potentially resource intensive as changing the threading model we
> compile a list of other "big issues" and prioritize.
> 

I doubt anyone expects the community to commit to the threading switch
in this sense - drop everything else and just start working on this
(pretty massive) change. Not going to happen.

> I realize open source is more of a scratch your itch kind of development
> model, but I'm not convinced the random walk that entails is the
> appropriate way to move forward. At the very least I'd like us to
> question it.

I may be missing something, but it's not clear to me whether you argue
for the open source approach or against it. I personally think it's
perfectly fine for people to work on scratching their itch and focus on
stuff that yields value to them (or their customers).

And I think the only way to succeed at the threading switch is within
this very framework - split it into (much) smaller steps that are
beneficial on their own and scratch some other itch.

For example, we have issues with large number of connections and we've
discussed stuff like built-in connection pooling etc. for a very long
time (including this thread). But we have session state in various
places in process private memory, which makes it borderline impossible
and thus we don't have anything built-in. IIUC the threading would needs
to isolate/define the session state anyway, so perhaps it could do it in
a way that'd also work for the connection pooling (with processes)?

Which would mean this particular change is immediately beneficial even
without the threading switch (which I'd expect to take considerable
amount of time).

In a way, I think this "split into independently beneficial steps"
strategy is the only option with a meaningful chance of success.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Let's make PostgreSQL multi-threaded

2023-06-11 Thread Dilip Kumar
On Sat, Jun 10, 2023 at 11:32 PM Hannu Krosing  wrote:
>
> On Mon, Jun 5, 2023 at 4:52 PM Heikki Linnakangas  wrote:
> >
> > If there are no major objections, I'm going to update the developer FAQ,
> > removing the excuses there for why we don't use threads [1].
>
> I think it is not wise to start the wholesale removal of the objections there.
>
> But I think it is worthwhile to revisit the section about threads and
> maybe split out the historic part which is no more true, and provide
> both pros and cons for these.
>
> I started with this short summary from the discussion in this thread,
> feel free to expand, argue, fix :)
> * is current excuse
> -- is counterargument or ack
> 
> As an example, threads are not yet used instead of multiple processes
> for backends because:
> * Historically, threads were poorly supported and buggy.
> -- yes they were, not relevant now when threads are well-supported and 
> non-buggy
>
> * An error in one backend can corrupt other backends if they're
> threads within a single process
> -- still valid for silent corruption
> -- for detected crash - yes, but we are restarting all backends in
> case of crash anyway.
>
> * Speed improvements using threads are small compared to the remaining
> backend startup time.
> -- we now have some measurements that show significant performance
> improvements not related to startup time
>
> * The backend code would be more complex.
> -- this is still the case
> -- even more worrisome is that all extensions also need to be rewritten
> -- and many incompatibilities will be silent and take potentially years to 
> find
>
> * Terminating backend processes allows the OS to cleanly and quickly
> free all resources, protecting against memory and file descriptor
> leaks and making backend shutdown cheaper and faster
> -- still true
>
> * Debugging threaded programs is much harder than debugging worker
> processes, and core dumps are much less useful
> -- this was countered by claiming that
>   -- by now we have reasonable debugger support for threads
>   -- there is no direct debugger support for debugging the exact
> system set up like PostgreSQL processes + shared memory
>
> * Sharing of read-only executable mappings and the use of
> shared_buffers means processes, like threads, are very memory
> efficient
> -- this seems to say that the current process model is as good as threads ?
> -- there were a few counterarguments
>   -- per-backend virtual memory mapping can add up to significant
> amount of extra RAM usage
>   -- the discussion did not yet touch various per-backend caches
> (pg_catalog cache, statement cache) which are arguably easier to
> implement in threaded model
>   -- TLB reload at each process switch is expensive and would be
> mostly avoided in case of threads

I think it is worth mentioning that parallel worker infrastructure
will be simplified with threaded models e.g. 'parallel query', and
'parallel vacuum'.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Let's make PostgreSQL multi-threaded

2023-06-10 Thread James Addison
I don't have an objection, but I do wonder: can one (or perhaps a few)
queries/workloads be provided where threading would be significantly
beneficial?

(some material there could help get people on-board with the idea and
potentially guide many of the smaller questions that arise along the
way)

On Mon, 5 Jun 2023 at 15:52, Heikki Linnakangas  wrote:
>
> I spoke with some folks at PGCon about making PostgreSQL multi-threaded,
> so that the whole server runs in a single process, with multiple
> threads. It has been discussed many times in the past, last thread on
> pgsql-hackers was back in 2017 when Konstantin made some experiments [0].
>
> I feel that there is now pretty strong consensus that it would be a good
> thing, more so than before. Lots of work to get there, and lots of
> details to be hashed out, but no objections to the idea at a high level.
>
> The purpose of this email is to make that silent consensus explicit. If
> you have objections to switching from the current multi-process
> architecture to a single-process, multi-threaded architecture, please
> speak up.
>
> If there are no major objections, I'm going to update the developer FAQ,
> removing the excuses there for why we don't use threads [1]. And we can
> start to talk about the path to get there. Below is a list of some
> hurdles and proposed high-level solutions. This isn't an exhaustive
> list, just some of the most obvious problems:
>
> # Transition period
>
> The transition surely cannot be done fully in one release. Even if we
> could pull it off in core, extensions will need more time to adapt.
> There will be a transition period of at least one release, probably
> more, where you can choose multi-process or multi-thread model using a
> GUC. Depending on how it goes, we can document it as experimental at first.
>
> # Thread per connection
>
> To get started, it's most straightforward to have one thread per
> connection, just replacing backend process with a backend thread. In the
> future, we might want to have a thread pool with some kind of a
> scheduler to assign active queries to worker threads. Or multiple
> threads per connection, or spawn additional helper threads for specific
> tasks. But that's future work.
>
> # Global variables
>
> We have a lot of global and static variables:
>
> $ objdump -t bin/postgres | grep -e "\.data" -e "\.bss" | grep -v
> "data.rel.ro" | wc -l
> 1666
>
> Some of them are pointers to shared memory structures and can stay as
> they are. But many of them are per-connection state. The most
> straightforward conversion for those is to turn them into thread-local
> variables, like Konstantin did in [0].
>
> It might be good to have some kind of a Session context struct that we
> pass everywhere, or maybe have a single thread-local variable to hold
> it. Many of the global variables would become fields in the Session. But
> that's future work.
>
> # Extensions
>
> A lot of extensions also contain global variables or other things that
> break in a multi-threaded environment. We need a way to label extensions
> that support multi-threading. And in the future, also extensions that
> *require* a multi-threaded server.
>
> Let's add flags to the control file to mark if the extension is
> thread-safe and/or process-safe. If you try to load an extension that's
> not compatible with the server's mode, throw an error.
>
> We might need new functions in addition _PG_init, called at connection
> startup and shutdown. And background worker API probably needs some changes.
>
> # Exposed PIDs
>
> We expose backend process PIDs to users in a few places.
> pg_stat_activity.pid and pg_terminate_backend(), for example. They need
> to be replaced, or we can assign a fake PID to each connection when
> running in multi-threaded mode.
>
> # Signals
>
> We use signals for communication between backends. SIGURG in latches,
> and SIGUSR1 in procsignal, for example. Those primitives need to be
> rewritten with some other signalling mechanism in multi-threaded mode.
> In principle, it's possible to set per-thread signal handlers, and send
> a signal to a particular thread (pthread_kill), but I think it's better
> to just rewrite them.
>
> We also document that you can send SIGINT, SIGTERM or SIGHUP to an
> individual backend process. I think we need to deprecate that, and maybe
> come up with some convenient replacement. E.g. send a message with
> backend ID to a unix domain socket, and a new pg_kill executable to send
> those messages.
>
> # Restart on crash
>
> If a backend process crashes, postmaster terminates all other backends
> and restarts the system. That's hard (impossible?) to do safely if
> everything runs in one process. We can continue have a separate
> postmaster process that just monitors the main process and restarts it
> on crash.
>
> # Thread-safe libraries
>
> Need to switch to thread-safe versions of library functions, e.g.
> uselocale() instead of setlocale().
>
> The Python interpreter has a Global 

Re: Let's make PostgreSQL multi-threaded

2023-06-10 Thread Hannu Krosing
On Mon, Jun 5, 2023 at 4:52 PM Heikki Linnakangas  wrote:
>
> If there are no major objections, I'm going to update the developer FAQ,
> removing the excuses there for why we don't use threads [1].

I think it is not wise to start the wholesale removal of the objections there.

But I think it is worthwhile to revisit the section about threads and
maybe split out the historic part which is no more true, and provide
both pros and cons for these.

I started with this short summary from the discussion in this thread,
feel free to expand, argue, fix :)
* is current excuse
-- is counterargument or ack

As an example, threads are not yet used instead of multiple processes
for backends because:
* Historically, threads were poorly supported and buggy.
-- yes they were, not relevant now when threads are well-supported and non-buggy

* An error in one backend can corrupt other backends if they're
threads within a single process
-- still valid for silent corruption
-- for detected crash - yes, but we are restarting all backends in
case of crash anyway.

* Speed improvements using threads are small compared to the remaining
backend startup time.
-- we now have some measurements that show significant performance
improvements not related to startup time

* The backend code would be more complex.
-- this is still the case
-- even more worrisome is that all extensions also need to be rewritten
-- and many incompatibilities will be silent and take potentially years to find

* Terminating backend processes allows the OS to cleanly and quickly
free all resources, protecting against memory and file descriptor
leaks and making backend shutdown cheaper and faster
-- still true

* Debugging threaded programs is much harder than debugging worker
processes, and core dumps are much less useful
-- this was countered by claiming that
  -- by now we have reasonable debugger support for threads
  -- there is no direct debugger support for debugging the exact
system set up like PostgreSQL processes + shared memory

* Sharing of read-only executable mappings and the use of
shared_buffers means processes, like threads, are very memory
efficient
-- this seems to say that the current process model is as good as threads ?
-- there were a few counterarguments
  -- per-backend virtual memory mapping can add up to significant
amount of extra RAM usage
  -- the discussion did not yet touch various per-backend caches
(pg_catalog cache, statement cache) which are arguably easier to
implement in threaded model
  -- TLB reload at each process switch is expensive and would be
mostly avoided in case of threads

* Regular creation and destruction of processes helps protect against
memory fragmentation, which can be hard to manage in long-running
processes
-- probably still true
-




Re: Let's make PostgreSQL multi-threaded

2023-06-10 Thread Dave Cramer
On Fri, 9 Jun 2023 at 18:29, Stephen Frost  wrote:

> Greetings,
>
> * Dave Cramer (davecramer@postgres.rocks) wrote:
> > One thing I can think of is upgrading. AFAIK dump and restore is the only
> > way to change the on disk format.
> > Presuming that eventually we will be forced to change the on disk format
> it
> > would be nice to be able to do so in a manner which does not force long
> > down times
>
> There is an ongoing effort moving in this direction.  The $subject isn't
> great, but this patch set (which we are currently working on
> updating...): https://commitfest.postgresql.org/43/3986/ attempts
> changing a lot of currently compile-time block-size pieces to be
> run-time which would open up the possibility to have a different page
> format for, eg, different tablespaces.  Possibly even different block
> sizes.  We'd certainly welcome discussion from others who are
> interested.
>
> Thanks,
>
> Stephen
>

Upgrading was just one example of difficult problems that need to be
addressed.
My thought was that before we commit to something as potentially resource
intensive as changing the threading model we compile a list of other "big
issues" and prioritize.

I realize open source is more of a scratch your itch kind of development
model, but I'm not convinced the random walk that entails is the
appropriate way to move forward. At the very least I'd like us to question
it.
Dave


Re: Let's make PostgreSQL multi-threaded

2023-06-09 Thread Bruce Momjian
On Thu, Jun  8, 2023 at 11:37:00AM +1200, Thomas Munro wrote:
> It's old, but this describes the 4 main models and which well known
> RDBMSes use them in section 2.3:
> 
> https://dsf.berkeley.edu/papers/fntdb07-architecture.pdf
> 
> TL;DR DB2 is the winner, it can do process-per-connection,
> thread-per-connection, process-pool or thread-pool.
> 
> I understand this thread to be about thread-per-connection (= backend,
> session, socket) for now.

I am quite confused that few people seem to care about which model,
processes or threads, is better for Oracle, and how having both methods
available can be a reasonable solution to maintain.  Someone suggested
they abstracted the differences so the maintenance burden was minor, but
that seems very hard to me.

Did these vendors start with processes, add threads, and then find that
threads had downsides so they had to keep both?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: Let's make PostgreSQL multi-threaded

2023-06-09 Thread Bruce Momjian
On Wed, Jun  7, 2023 at 06:38:38PM +0530, Ashutosh Bapat wrote:
> With multiple processes, we can use all the available cores (at least
> theoretically if all those processes are independent). But is that
> guaranteed with single process multi-thread model? Google didn't throw
> any definitive answer to that. Usually it depends upon the OS and
> architecture.
> 
> Maybe a good start is to start using threads instead of parallel
> workers e.g. for parallel vacuum, parallel query and so on while
> leaving the processes for connections and leaders. that itself might
> take significant time. Based on that experience move to a completely
> threaded model. Based on my experience with other similar products, I
> think we will settle on a multi-process multi-thread model.

I think we have a few known problem that we might be able to solve
without threads, but can help us eventually move to threads if we find
it useful:

1)  Use threads for background workers rather than processes
2)  Allow sessions to be stopped and started by saving their state

Ideally we would solve the problem of making shared structures
resizable, but I am not sure how that can be easily done without
threads.
 
-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: Let's make PostgreSQL multi-threaded

2023-06-09 Thread Stephen Frost
Greetings,

* Dave Cramer (davecramer@postgres.rocks) wrote:
> One thing I can think of is upgrading. AFAIK dump and restore is the only
> way to change the on disk format.
> Presuming that eventually we will be forced to change the on disk format it
> would be nice to be able to do so in a manner which does not force long
> down times

There is an ongoing effort moving in this direction.  The $subject isn't
great, but this patch set (which we are currently working on
updating...): https://commitfest.postgresql.org/43/3986/ attempts
changing a lot of currently compile-time block-size pieces to be
run-time which would open up the possibility to have a different page
format for, eg, different tablespaces.  Possibly even different block
sizes.  We'd certainly welcome discussion from others who are
interested.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Let's make PostgreSQL multi-threaded

2023-06-09 Thread Matthias van de Meent
On Fri, 9 Jun 2023 at 17:20, Dave Cramer  wrote:
>
> This is somewhat orthogonal to the topic of threading but relevant to the use 
> of resources.
>
> If we are going to undertake some hard problems perhaps we should be looking 
> at other problems that solve other long term issues before we commit to 
> spending resources on changing the process model.

-1. This and that are orthogonal and effort in one does not need to
block the other. If someone is willing to put in the effort, let them.
Last time I checked we, as a project, are not blocking bugfixes for
new features in MAIN either (or vice versa).

> One thing I can think of is upgrading. AFAIK dump and restore is the only way 
> to change the on disk format.
> Presuming that eventually we will be forced to change the on disk format it 
> would be nice to be able to do so in a manner which does not force long down 
> times

I agree that we should improve our upgrade process (and we had a great
discussion on the topic at the PGCon Unconference last week), but in
my view that's not relevant to this discussion.

Kind regards,

Matthias van de Meent
Neon, Inc.




Re: Let's make PostgreSQL multi-threaded

2023-06-09 Thread Dave Cramer
This is somewhat orthogonal to the topic of threading but relevant to the
use of resources.

If we are going to undertake some hard problems perhaps we should be
looking at other problems that solve other long term issues before we
commit to spending resources on changing the process model.

One thing I can think of is upgrading. AFAIK dump and restore is the only
way to change the on disk format.
Presuming that eventually we will be forced to change the on disk format it
would be nice to be able to do so in a manner which does not force long
down times

 Dave

>
>>


Re: Let's make PostgreSQL multi-threaded

2023-06-08 Thread Stephan Doliov
This is an interesting message thread. I think in regards to the OP's call
to make PG multi-threaded, there should be a clear and identifiable
performance target and use cases for the target. How much performance boost
can be expected, and if so, in which data application context? Will queries
return faster for transactional use cases? analytic use cases? How much
data needs to be stored before one can observe the difference, or better
yet, a difference with a measurable impact on reduced cloud compute costs
as a % of compute cloud costs. I think if you can demonstrate for different
test datasets what those savings amount to you can either find momentum to
pursue it. Beyond that, even with better modern tooling for multi-threaded
development, it's obviously a big lift (may well be worth it!). Some of us
cagey old cats on this list (at least me) still have some work to do to
shed the baggage that previous pain of MT dev has caused us. :-)

Cheers,
Steve

On Thu, Jun 8, 2023 at 1:26 PM Andres Freund  wrote:

> Hi,
>
> On 2023-06-09 07:34:49 +1200, Thomas Munro wrote:
> > I wasn't in Mathew Wilcox's unconference in Ottawa but I found an
> > older article on LWN:
> >
> > https://lwn.net/Articles/895217/
> >
> > For what it's worth, FreeBSD hackers have studied this topic too (and
> > it's been done in Android and no doubt other systems before):
> >
> > https://www.cs.rochester.edu/u/sandhya/papers/ispass19.pdf
> >
> > I've shared that paper on this list before in the context of
> > super/huge pages and their benefits (to executable code, and to the
> > buffer pool), but a second topic in that paper is the idea of a shared
> > page table: "We find that sharing PTPs across different processes can
> > reduce execution cycles by as much as 6.9%. Moreover, the combined
> > effects of using superpages to map the main executable and sharing
> > PTPs for the small shared libraries can reduce execution cycles up to
> > 18.2%."  And that's just part of it, because those guys are more
> > interested in shared code/libraries and such so that's probably not
> > even getting to the stuff like buffer pool and DSMs that we might tend
> > to think of first.
>
> I've experimented with using huge pages for executable code on linux, and
> the
> benefits are quite noticable:
>
> https://www.postgresql.org/message-id/20221104212126.qfh3yzi7luvyy5d6%40awork3.anarazel.de
>
> I'm a bit dubious that sharing the page table for executable code increase
> the
> benefit that much further in real workloads. I suspect the reason it was
> different for the authors of the paper is:
>
> > A fixed number of back-to-back
> > transactions are performed on a 5GB database, and we use the
> > -C option of pgbench to toggle between reconnecting after
> > each transaction (reconnect mode) and using one persistent
> > connection per client (persistent connection mode). We use
> > the reconnect mode by default unless stated otherwise.
>
> Using -C explains why you'd see a lot of benefit from sharing page tables
> for
> executable code. But I don't think -C is a particularly interesting
> workload
> to optimize for.
>
>
> > I'm no expert in this stuff, but it seems to be that with shared page
> > table schemes you can avoid wasting huge amounts of RAM on duplicated
> > page table entries (pages * processes), and with huge/super pages you
> > can reduce the number of pages, but AFAIK you still can't escape the
> > TLB shootdown cost, which is all-or-nothing (PCID level at best).
>
> Pretty much that. While you can avoid some TLB shootdowns via PCIDs, that
> only
> avoids flushing the TLB, it doesn't help with the TLB hit rate being much
> lower due to the number of "redundant" mappings with different PCIDs.
>
>
> > The only way to avoid TLB shootdowns on context switches is to have
> *exactly
> > the same memory map*.  Or, as Robert succinctly shouted, "THREADS".
>
> +1
>
> Greetings,
>
> Andres Freund
>
>
>


Re: Let's make PostgreSQL multi-threaded

2023-06-08 Thread Andres Freund
Hi,

On 2023-06-09 07:34:49 +1200, Thomas Munro wrote:
> I wasn't in Mathew Wilcox's unconference in Ottawa but I found an
> older article on LWN:
> 
> https://lwn.net/Articles/895217/
> 
> For what it's worth, FreeBSD hackers have studied this topic too (and
> it's been done in Android and no doubt other systems before):
> 
> https://www.cs.rochester.edu/u/sandhya/papers/ispass19.pdf
> 
> I've shared that paper on this list before in the context of
> super/huge pages and their benefits (to executable code, and to the
> buffer pool), but a second topic in that paper is the idea of a shared
> page table: "We find that sharing PTPs across different processes can
> reduce execution cycles by as much as 6.9%. Moreover, the combined
> effects of using superpages to map the main executable and sharing
> PTPs for the small shared libraries can reduce execution cycles up to
> 18.2%."  And that's just part of it, because those guys are more
> interested in shared code/libraries and such so that's probably not
> even getting to the stuff like buffer pool and DSMs that we might tend
> to think of first.

I've experimented with using huge pages for executable code on linux, and the
benefits are quite noticable:
https://www.postgresql.org/message-id/20221104212126.qfh3yzi7luvyy5d6%40awork3.anarazel.de

I'm a bit dubious that sharing the page table for executable code increase the
benefit that much further in real workloads. I suspect the reason it was
different for the authors of the paper is:

> A fixed number of back-to-back
> transactions are performed on a 5GB database, and we use the
> -C option of pgbench to toggle between reconnecting after
> each transaction (reconnect mode) and using one persistent
> connection per client (persistent connection mode). We use
> the reconnect mode by default unless stated otherwise.

Using -C explains why you'd see a lot of benefit from sharing page tables for
executable code. But I don't think -C is a particularly interesting workload
to optimize for.


> I'm no expert in this stuff, but it seems to be that with shared page
> table schemes you can avoid wasting huge amounts of RAM on duplicated
> page table entries (pages * processes), and with huge/super pages you
> can reduce the number of pages, but AFAIK you still can't escape the
> TLB shootdown cost, which is all-or-nothing (PCID level at best).

Pretty much that. While you can avoid some TLB shootdowns via PCIDs, that only
avoids flushing the TLB, it doesn't help with the TLB hit rate being much
lower due to the number of "redundant" mappings with different PCIDs.


> The only way to avoid TLB shootdowns on context switches is to have *exactly
> the same memory map*.  Or, as Robert succinctly shouted, "THREADS".

+1

Greetings,

Andres Freund




Re: Let's make PostgreSQL multi-threaded

2023-06-08 Thread Dave Cramer
On Thu, 8 Jun 2023 at 13:08, Hannu Krosing  wrote:

> I discovered this thread from a Twitter post "PostgreSQL will finally
> be rewritten in Rust"  :)
>

By the time we got around to finishing this, there would be a better
language to write it in.

Dave


Re: Let's make PostgreSQL multi-threaded

2023-06-08 Thread Dmitry Dolgov
> On Mon, Jun 05, 2023 at 06:43:54PM +0300, Heikki Linnakangas wrote:
> On 05/06/2023 11:28, Tristan Partin wrote:
> > > # Exposed PIDs
> > >
> > > We expose backend process PIDs to users in a few places.
> > > pg_stat_activity.pid and pg_terminate_backend(), for example. They need
> > > to be replaced, or we can assign a fake PID to each connection when
> > > running in multi-threaded mode.
> >
> > Would it be possible to just transparently slot in the thread ID
> > instead?
>
> Perhaps. It might break applications that use the PID directly with e.g.
> 'kill ', though.

I think things are getting more interesting if some external resource
accounting like cgroups is taking place. From what I know cgroup v2 has
only few controllers that allow threaded granularity, and memory or io
controllers are not part of this list. Since Postgres is doing quite a
lot of different things, sometimes it makes sense to put different
limitations on different types of activity, e.g. to give more priority
to a certain critical internal job on the account of slowing down
backends. In the end it might be complicated or not possible to do that
for individual threads. Such cases are probably not very important from
the high level point of view, but could become an argument when deciding
what should be a process and what should be a thread.




Re: Let's make PostgreSQL multi-threaded

2023-06-08 Thread Thomas Munro
On Fri, Jun 9, 2023 at 4:00 AM Andres Freund  wrote:
> On 2023-06-08 12:15:58 +0200, Hannu Krosing wrote:
> > > This part was touched in the "AMA with a Linux Kernale Hacker"
> > > Unconference session where he mentioned that the had proposed a
> > > 'mshare' syscall for this.
>
> As-is that'd just lead to sharing page table, not the TLB. I don't think you
> currently do sharing of the TLB for parts of your address space on x86
> hardware. It's possible that something like that gets added to future
> hardware, but ...

I wasn't in Mathew Wilcox's unconference in Ottawa but I found an
older article on LWN:

https://lwn.net/Articles/895217/

For what it's worth, FreeBSD hackers have studied this topic too (and
it's been done in Android and no doubt other systems before):

https://www.cs.rochester.edu/u/sandhya/papers/ispass19.pdf

I've shared that paper on this list before in the context of
super/huge pages and their benefits (to executable code, and to the
buffer pool), but a second topic in that paper is the idea of a shared
page table: "We find that sharing PTPs across different processes can
reduce execution cycles by as much as 6.9%. Moreover, the combined
effects of using superpages to map the main executable and sharing
PTPs for the small shared libraries can reduce execution cycles up to
18.2%."  And that's just part of it, because those guys are more
interested in shared code/libraries and such so that's probably not
even getting to the stuff like buffer pool and DSMs that we might tend
to think of first.

I'm pretty sure PostgreSQL (along with another fork-based RDBMSs
mentioned in this thread) must be one of the worst offenders for page
table bloat, simply because we can have a lot of processes and touch a
lot of memory.

I'm no expert in this stuff, but it seems to be that with shared page
table schemes you can avoid wasting huge amounts of RAM on duplicated
page table entries (pages * processes), and with huge/super pages you
can reduce the number of pages, but AFAIK you still can't escape the
TLB shootdown cost, which is all-or-nothing (PCID level at best).  The
only way to avoid TLB shootdowns on context switches is to have
*exactly the same memory map*.  Or, as Robert succinctly shouted,
"THREADS".




Re: Let's make PostgreSQL multi-threaded

2023-06-08 Thread Jose Luis Tallon

On 8/6/23 15:56, Robert Haas wrote:

Yeah, I've had similar thoughts. I'm not exactly sure what the
advantages of such a refactoring might be, but the current structure
feels pretty limiting. It works OK because we don't do anything in the
postmaster other than fork a new backend, but I'm not sure if that's
the best strategy. It means, for example, that if there's a ton of new
connection requests, we're spawning a ton of new processes, which
means that you can put a lot of load on a PostgreSQL instance even if
you can't authenticate. Maybe we'd be better off with a pool of
processes accepting connections; if authentication fails, that
connection goes back into the pool and tries again.


    This. It's limited by connection I/O, hence a perfect use for 
threads (minimize per-connection overhead).


IMV, "session state" would be best stored/managed here. Would need a way 
to convey it efficiently, though.



If authentication
succeeds, either that process transitions to being a regular backend,
leaving the authentication pool, or perhaps hands off the connection
to a "real backend" at that point and loops around to accept() the
next request.


Nicely done by passing the FD around

But at this point, we'd just get a nice reimplementation of a threaded 
connection pool inside Postgres :\



Whether that's a good ideal in detail or not, the point remains that
having the postmaster handle this task is quite limiting. It forces us
to hand off the connection to a new process at the earliest possible
stage, so that the postmaster remains free to handle other duties.
Giving the responsibility to another process would let us make
decisions about where to perform the hand-off based on real
architectural thought rather than being forced to do a certain way
because nothing else will work.


At least "tcop" surely feels like belonging in a separate process 


    J.L.





Re: Let's make PostgreSQL multi-threaded

2023-06-08 Thread Thomas Munro
On Fri, Jun 9, 2023 at 5:02 AM Ilya Anfimov  wrote:
>  Isn't  all  the  memory operations would require nearly the same
> shared memory allocators if someone switches to a threaded imple-
> mentation?

It's true that we'd need concurrency-aware MemoryContext
implementations (details can be debated), but we wouldn't need that
address translation layer, which adds a measurable cost at every
access.




Re: Let's make PostgreSQL multi-threaded

2023-06-08 Thread Andres Freund
Hi,

On 2023-06-08 11:56:13 -0400, Robert Haas wrote:
> On Thu, Jun 8, 2023 at 11:02 AM Hannu Krosing  wrote:
> > No, I meant that this needs to be fixed at OS level, by being able to
> > use the same mapping.
> >
> > We should not shy away from asking the OS people for adding the useful
> > features still missing.
> >
> > It was mentioned in the Unconference Kernel Hacker AMA talk and said
> > kernel hacker works for Oracle, andf they also seemed to be needing
> > this :)
> 
> Fair enough, but we aspire to work on a bunch of different operating
> systems. To make use of an OS facility, we need something that works
> on at least Linux, Windows, macOS, and a few different BSD flavors.
> It's not as if when the PostgreSQL project asks for a new operating
> system facility everyone springs into action to provide it
> immediately. And even if they did, and even if they all released an
> implementation of whatever we requested next year, it would still be
> at least five, more realistically ten, years before systems with those
> facilities were ubiquitous.

I'm less concerned about this aspect - most won't have upgraded to a version
of postgres that benefit from threaded postgres in a similar timeframe. And if
the benefits are large enough, people will move.  But:


> And unless we have truly obscene amounts of clout in the OS community, it's
> likely that all of those different operating systems would implement
> different things to meet the stated need, and then we'd have to have a
> complex bunch of platform-dependent code in order to keep working on all of
> those systems.

And even more likely, they just won't do anything, because it's a model that
large parts of the industry have decided isn't going anywhere. It'd be one
thing if we had 5 kernel devs that we could deploy to work on this, but we
don't. So we have to convince kernel devs employed by others that somehow this
is an urgent enough thing that they should work on it. The likely, imo
justified, answer is just going to be: Fix your architecture, then we can
talk.

Greetings,

Andres Freund




Re: Let's make PostgreSQL multi-threaded

2023-06-08 Thread Andres Freund
Hi,

On 2023-06-08 17:55:57 +0200, Matthias van de Meent wrote:
> While I agree that "sharing page tables across processes" is useful,
> it looks like it'd be much more effort to correctly implement for e.g.
> DSM than implementing threading.
> Konstantin's diff is "only" 20.1k lines [0] added and/or modified,
> which is a lot, but it's manageable (13k+ of which are from files that
> were auto-generated and then committed, likely accidentally).

Honestly, I don't think this patch is in a good enough state to allow a
realistic estimation of the overall work. Making global variables TLS is the
*easy* part.  Redesigning postmaster, definining how to deal with extension
libraries, extension compatibility, developing tools to make developing a
threaded postgres feasible, dealing with freeing session lifetime memory
allocations that previously were freed via process exit, making the change
realistically reviewable, portability are all much harder.

Greetings,

Andres Freund




Re: Let's make PostgreSQL multi-threaded

2023-06-08 Thread Andres Freund
Hi,

On 2023-06-08 17:02:08 +0200, Hannu Krosing wrote:
> On Thu, Jun 8, 2023 at 4:56 PM Robert Haas  wrote:
> >
> > On Thu, Jun 8, 2023 at 8:44 AM Hannu Krosing  wrote:
> > > > That sounds like a bad idea, dynamic shared memory is more expensive
> > > > to maintain than our static shared memory systems, not in the least
> > > > because DSM is not guaranteed to share the same addresses in each
> > > > process' address space.
> > >
> > > Then this too needs to be fixed
> >
> > Honestly, I'm struggling to respond to this non-sarcastically. I mean,
> > I was the one who implemented DSM. Do you think it works the way that
> > it works because I considered doing something smart and decided to do
> > something dumb instead?
>
> No, I meant that this needs to be fixed at OS level, by being able to
> use the same mapping.
>
> We should not shy away from asking the OS people for adding the useful
> features still missing.

There's a large part of this that is about hardware, not software. And
honestly, for most of the problems the answer is to just use threads. Adding
complexity to operating systems to make odd architectures like postgres'
better is a pretty dubious proposition.

I don't think we have even remotely enough influence on CPU design to make
e.g. *partial* TLB sharing across processes a thing.


> It was mentioned in the Unconference Kernel Hacker AMA talk and said
> kernel hacker works for Oracle, andf they also seemed to be needing
> this :)

The proposals around that don't really help us all that much. Sharing the page
table will be a bit more efficient, but it won't really change anything
dramatically.  From what I understand they are primarily interested in
changing properties of a memory mapping across multiple processes, e.g. making
some memory executable and have that reflected in all processes. I don't think
this will help us much.

Greetings,

Andres Freund




Re: Let's make PostgreSQL multi-threaded

2023-06-08 Thread Hannu Krosing
I discovered this thread from a Twitter post "PostgreSQL will finally
be rewritten in Rust"  :)

On Mon, Jun 5, 2023 at 5:18 PM Tom Lane  wrote:
>
> Heikki Linnakangas  writes:
> > I spoke with some folks at PGCon about making PostgreSQL multi-threaded,
> > so that the whole server runs in a single process, with multiple
> > threads. It has been discussed many times in the past, last thread on
> > pgsql-hackers was back in 2017 when Konstantin made some experiments [0].
>
> > I feel that there is now pretty strong consensus that it would be a good
> > thing, more so than before. Lots of work to get there, and lots of
> > details to be hashed out, but no objections to the idea at a high level.
>
> > The purpose of this email is to make that silent consensus explicit. If
> > you have objections to switching from the current multi-process
> > architecture to a single-process, multi-threaded architecture, please
> > speak up.
>
> For the record, I think this will be a disaster.  There is far too much
> code that will get broken, largely silently, and much of it is not
> under our control.
>
> regards, tom lane
>
>




Re: Let's make PostgreSQL multi-threaded

2023-06-08 Thread Ilya Anfimov
On Wed, Jun 07, 2023 at 10:26:07AM +1200, Thomas Munro wrote:
> On Tue, Jun 6, 2023 at 6:52???AM Andrew Dunstan  wrote:
> > If we were starting out today we would probably choose a threaded 
> > implementation. But moving to threaded now seems to me like a 
> > multi-year-multi-person project with the prospect of years to come chasing 
> > bugs and the prospect of fairly modest advantages. The risk to reward 
> > doesn't look great.
> >
> > That's my initial reaction. I could be convinced otherwise.
> 
> Here is one thing I often think about when contemplating threads.
> Take a look at dsa.c.  It calls itself a shared memory allocator, but
> really it has two jobs, the second being to provide software emulation
> of virtual memory.  That???s behind dshash.c and now the stats system,
> and various parts of the parallel executor code.  It???s slow and
> complicated, and far from the state of the art.  I wrote that code
> (building on allocator code from Robert) with the expectation that it
> was a transitional solution to unblock a bunch of projects.  I always
> expected that we'd eventually be deleting it.  When I explain that
> subsystem to people who are not steeped in the lore of PostgreSQL, it
> sounds completely absurd.  I mean, ... it is, right?My point is

 Isn't  all  the  memory operations would require nearly the same
shared memory allocators if someone switches to a threaded imple-
mentation?

> that we???re doing pretty unreasonable and inefficient contortions to
> develop new features -- we're not just happily chugging along without
> threads at no cost.
> 




Re: Let's make PostgreSQL multi-threaded

2023-06-08 Thread Andres Freund
Hi,

On 2023-06-08 16:47:48 +0300, Konstantin Knizhnik wrote:
> Actually TLS not not more expensive then accessing struct fields (at least
> at x86 platform), consider the following program:

It really depends on the OS and the architecture, not just the
architecture. And even on x86-64 Linux, the fact that you're using the segment
offset in the address calculation means you can't use the more complicated
addressing modes for other reasons. And plenty instructions, e.g. most (?) SSE
instructions, won't be able to use that kind of addressing directly.

Even just compiling your, example you can see that with gcc -O2 you get
considerably faster code with the non-TLS version.

As a fairly extreme example, here's the mingw -O3 compiled code:

use_struct:
movqxmm1, QWORD PTR .LC0[rip]
movqxmm0, QWORD PTR [rcx]
add DWORD PTR 8[rcx], 1
paddd   xmm0, xmm1
movqQWORD PTR [rcx], xmm0
ret
use_tls:
sub rsp, 40
lea rcx, __emutls_v.a[rip]
call__emutls_get_address
lea rcx, __emutls_v.b[rip]
add DWORD PTR [rax], 1
call__emutls_get_address
lea rcx, __emutls_v.c[rip]
add DWORD PTR [rax], 1
call__emutls_get_address
add DWORD PTR [rax], 1
add rsp, 40
ret

Greetings,

Andres Freund




Re: Let's make PostgreSQL multi-threaded

2023-06-08 Thread Greg Sabino Mullane
On Thu, Jun 8, 2023 at 8:44 AM Hannu Krosing  wrote:

> Do we have any statistics for the distribution of our user base ?
>
> My gut feeling says that for performance-critical use the non-Linux is
> in low single digits at best.
>

Stats are probably not possible, but based on years of consulting, as well
as watching places like SO, Slack, IRC, etc. over the years, IMO that's a
very accurate gut feeling. I'd hazard 1% or less for non-Linux systems.

Cheers,
Greg


Re: Let's make PostgreSQL multi-threaded

2023-06-08 Thread Andres Freund
Hi,

On 2023-06-08 12:15:58 +0200, Hannu Krosing wrote:
> On Thu, Jun 8, 2023 at 11:54 AM Hannu Krosing  wrote:
> >
> > On Wed, Jun 7, 2023 at 11:37 PM Andres Freund  wrote:
> > >
> > > Hi,
> > >
> > > On 2023-06-05 13:40:13 -0400, Jonathan S. Katz wrote:
> > > > 2. While I wouldn't want to necessarily discourage a moonshot effort, I
> > > > would ask if developer time could be better spent on tackling some of 
> > > > the
> > > > other problems around vertical scalability? Per some PGCon discussions,
> > > > there's still room for improvement in how PostgreSQL can best utilize
> > > > resources available very large "commodity" machines (a 448-core / 24TB 
> > > > RAM
> > > > instance comes to mind).
> > >
> > > I think we're starting to hit quite a few limits related to the process 
> > > model,
> > > particularly on bigger machines. The overhead of cross-process context
> > > switches is inherently higher than switching between threads in the same
> > > process - and my suspicion is that that overhead will continue to
> > > increase. Once you have a significant number of connections we end up 
> > > spending
> > > a *lot* of time in TLB misses, and that's inherent to the process model,
> > > because you can't share the TLB across processes.
> >
> >
> > This part was touched in the "AMA with a Linux Kernale Hacker"
> > Unconference session where he mentioned that the had proposed a
> > 'mshare' syscall for this.

As-is that'd just lead to sharing page table, not the TLB. I don't think you
currently do sharing of the TLB for parts of your address space on x86
hardware. It's possible that something like that gets added to future
hardware, but ...


> Also, the *static* huge pages already let you solve this problem now
> by sharing the page tables

You don't share the page tables with huge pages on linux.


- Andres




Re: Let's make PostgreSQL multi-threaded

2023-06-08 Thread Andres Freund
On 2023-06-08 14:01:16 +0200, Jose Luis Tallon wrote:
> * For "heavyweight" queries, the scalability of "almost independent"
> processes w.r.t. NUMA is just _impossible to achieve_ (locality of
> reference!) with a pure threaded system. When CPU+mem-bound
> (bandwidth-wise), threads add nothing IMO.

I don't think this is true in any sort of way.




Re: Let's make PostgreSQL multi-threaded

2023-06-08 Thread Robert Haas
On Thu, Jun 8, 2023 at 11:02 AM Hannu Krosing  wrote:
> No, I meant that this needs to be fixed at OS level, by being able to
> use the same mapping.
>
> We should not shy away from asking the OS people for adding the useful
> features still missing.
>
> It was mentioned in the Unconference Kernel Hacker AMA talk and said
> kernel hacker works for Oracle, andf they also seemed to be needing
> this :)

Fair enough, but we aspire to work on a bunch of different operating
systems. To make use of an OS facility, we need something that works
on at least Linux, Windows, macOS, and a few different BSD flavors.
It's not as if when the PostgreSQL project asks for a new operating
system facility everyone springs into action to provide it
immediately. And even if they did, and even if they all released an
implementation of whatever we requested next year, it would still be
at least five, more realistically ten, years before systems with those
facilities were ubiquitous. And unless we have truly obscene amounts
of clout in the OS community, it's likely that all of those different
operating systems would implement different things to meet the stated
need, and then we'd have to have a complex bunch of platform-dependent
code in order to keep working on all of those systems.

To me, this is a road to nowhere. I have no problem at all with us
expressing our needs to the OS community, but realistically, any
PostgreSQL feature that depends on an OS feature less than twenty
years old is going to have to be optional, which means that if we want
to do anything about sharing address space mappings in the next few
years, it's going to need to be based on threads.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Let's make PostgreSQL multi-threaded

2023-06-08 Thread Matthias van de Meent
On Thu, 8 Jun 2023 at 17:02, Hannu Krosing  wrote:
>
> On Thu, Jun 8, 2023 at 4:56 PM Robert Haas  wrote:
> >
> > On Thu, Jun 8, 2023 at 8:44 AM Hannu Krosing  wrote:
> > > > That sounds like a bad idea, dynamic shared memory is more expensive
> > > > to maintain than our static shared memory systems, not in the least
> > > > because DSM is not guaranteed to share the same addresses in each
> > > > process' address space.
> > >
> > > Then this too needs to be fixed
> >
> > Honestly, I'm struggling to respond to this non-sarcastically. I mean,
> > I was the one who implemented DSM. Do you think it works the way that
> > it works because I considered doing something smart and decided to do
> > something dumb instead?
>
> No, I meant that this needs to be fixed at OS level, by being able to
> use the same mapping.
>
> We should not shy away from asking the OS people for adding the useful
> features still missing.

While I agree that "sharing page tables across processes" is useful,
it looks like it'd be much more effort to correctly implement for e.g.
DSM than implementing threading.
Konstantin's diff is "only" 20.1k lines [0] added and/or modified,
which is a lot, but it's manageable (13k+ of which are from files that
were auto-generated and then committed, likely accidentally).

> It was mentioned in the Unconference Kernel Hacker AMA talk and said
> kernel hacker works for Oracle, andf they also seemed to be needing
> this :)

Though these new kernel features allowing for better performance
(mostly in kernel memory usage, probably) would be nice to have, we
wouldn't get performance benefits for older kernels, benefits which we
would get if we were to implement threading.
I'm not on board with a policy of us twiddling thumbs and waiting for
the OS to fix our architectural performance issues. Sure, the kernel
could optimize for our usage pattern, but I think that's not something
we can (or should) rely on for performance ^1.

Kind regards,

Matthias van de Meent

[0] 
https://github.com/postgrespro/postgresql.pthreads/compare/801386af...d5933309?w=1
^1 OT: I think the same about us (ab)using the OS page cache, but
that's a tale for a different time and thread.




Re: Let's make PostgreSQL multi-threaded

2023-06-08 Thread Andres Freund
On 2023-06-08 10:33:26 -0400, Greg Stark wrote:
> On Wed, 7 Jun 2023 at 18:09, Andres Freund  wrote:
> > Having the same memory mapping between threads makes allows the
> > hardware to share the TLB (on x86 via process context identifiers), which
> > isn't realistically possible with different processes.
> 
> As a matter of historical interest Solaris actually did implement this
> across different processes. It was called by the somewhat unfortunate
> name "Intimate Shared Memory". I don't think Linux ever implemented
> anything like it but I'm not sure.

I don't think it shared the TLB - it did share page tables though.




Re: Let's make PostgreSQL multi-threaded

2023-06-08 Thread Matthias van de Meent
On Thu, 8 Jun 2023 at 14:44, Hannu Krosing  wrote:
>
> On Thu, Jun 8, 2023 at 2:15 PM Matthias van de Meent
>  wrote:
> >
> > On Thu, 8 Jun 2023 at 11:54, Hannu Krosing  wrote:
> > >
> > > This part was touched in the "AMA with a Linux Kernale Hacker"
> > > Unconference session where he mentioned that the had proposed a
> > > 'mshare' syscall for this.
> > >
> > > So maybe a more fruitful way to fixing the perceived issues with
> > > process model is to push for small changes in Linux to overcome these
> > > avoiding a wholesale rewrite ?
> >
> > We support not just Linux, but also Windows and several (?) BSDs. I'm
> > not against pushing Linux to make things easier for us, but Linux is
> > an open source project, too, where someone need to put in time to get
> > the shiny things that you want. And I'd rather see our time spent in
> > PostgreSQL, as Linux is only used by a part of our user base.
>
> Do we have any statistics for the distribution of our user base ?
>
> My gut feeling says that for performance-critical use the non-Linux is
> in low single digits at best.
>
> My fascination for OpenSource started with realisation that instead of
> workarounds you can actually fix the problem at source. So if the
> specific problem is that TLB is not shared then the proper fix is
> making it shared instead of rewriting everything else to get around
> it. None of us is limited to writing code in PostgreSQL only. If the
> easiest and more generix fix can be done in Linux then so be it.

TLB is a CPU hardware facility, not something that the OS can decide
to share between processes. While sharing (some) OS memory management
facilities across threads might be possible (as you mention, that
mshare syscall would be an example), that doesn't solve the issue of
the hardware not supporting sharing TLB entries across processes. We'd
use less kernel memory for memory management, but the CPU would still
stall on TLB misses every time we switch processes on the CPU (unless
we somehow were able to use non-process-namespaced TLB entries, which
would make our processes not meaningfully different from threads
w.r.t. address space).

> > >
> > > Maybe we can already remove the distinction between static and dynamic
> > > shared memory ?
> >
> > That sounds like a bad idea, dynamic shared memory is more expensive
> > to maintain than our static shared memory systems, not in the least
> > because DSM is not guaranteed to share the same addresses in each
> > process' address space.
>
> Then this too needs to be fixed

That needs kernel facilities in all (most?) supported OSes, and I
think that's much more work than moving to threads:
Allocations from the kernel are arbitrarily random across the
available address space, so a DSM segment that is allocated in one
backend might overlap with unshared allocations of a different
backend, making those backends have conflicting memory address spaces.
The only way to make that work is to have a shared memory addressing
space, but some backends just not having the allocation mapped into
their local address space; which seems only slightly more isolated
than threads and much more effort to maintain.

> > > Though I already heard some complaints at the conference discussions
> > > that having the dynamic version available has made some developers
> > > sloppy in using it resulting in wastefulness.
> >
> > Do you know any examples of this wastefulness?
>
> No. Just somebody mentioned it in a hallway conversation and the rest
> of the developers present mumbled approvingly :)

The only "wastefulness" that I know of in our use of DSM is the queue,
and that's by design: We need to move data from a backend's private
memory to memory that's accessible to other backends; i.e. shared
memory. You can't do that without copying or exposing your private
memory.

> > > Still we should be focusing our attention at solving the issues and
> > > not at "moving to threads" and hoping this will fix the issues by
> > > itself.
> >
> > I suspect that it is much easier to solve some of the issues when
> > working in a shared address space.
>
> Probably. But it would come at the cost of needing to change a lot of
> other parts of PostgreSQL.
>
> I am not against making code cleaner for potential threaded model
> support. I am just a bit sceptical about the actual switch being easy,
> or doable in the next 10-15 years.

PostgreSQL only has a support cycle of 5 years. 5 years after the last
release of un-threaded PostgreSQL we could drop support for "legacy"
extension models that don't support threading.

> > E.g. resizing shared_buffers is difficult right now due to the use of
> > a static allocation of shared memory, but if we had access to a single
> > shared address space, it'd be easier to do any cleanup necessary for
> > dynamically increasing/decreasing its size.
>
> This again could be done with shared memory mapping + dynamic shared memory.

Yes, but as I said, that's much more difficult than lock and/or 

Re: Let's make PostgreSQL multi-threaded

2023-06-08 Thread Hannu Krosing
On Thu, Jun 8, 2023 at 4:56 PM Robert Haas  wrote:
>
> On Thu, Jun 8, 2023 at 8:44 AM Hannu Krosing  wrote:
> > > That sounds like a bad idea, dynamic shared memory is more expensive
> > > to maintain than our static shared memory systems, not in the least
> > > because DSM is not guaranteed to share the same addresses in each
> > > process' address space.
> >
> > Then this too needs to be fixed
>
> Honestly, I'm struggling to respond to this non-sarcastically. I mean,
> I was the one who implemented DSM. Do you think it works the way that
> it works because I considered doing something smart and decided to do
> something dumb instead?

No, I meant that this needs to be fixed at OS level, by being able to
use the same mapping.

We should not shy away from asking the OS people for adding the useful
features still missing.

It was mentioned in the Unconference Kernel Hacker AMA talk and said
kernel hacker works for Oracle, andf they also seemed to be needing
this :)




Re: Let's make PostgreSQL multi-threaded

2023-06-08 Thread Robert Haas
On Thu, Jun 8, 2023 at 8:44 AM Hannu Krosing  wrote:
> > That sounds like a bad idea, dynamic shared memory is more expensive
> > to maintain than our static shared memory systems, not in the least
> > because DSM is not guaranteed to share the same addresses in each
> > process' address space.
>
> Then this too needs to be fixed

Honestly, I'm struggling to respond to this non-sarcastically. I mean,
I was the one who implemented DSM. Do you think it works the way that
it works because I considered doing something smart and decided to do
something dumb instead?

Suppose you have two PostgreSQL backends A and B. If we're not running
on Windows, each of these was forked from the postmaster, so things
like the text and data segments and the main shared memory segment are
going to be mapped at the same address in both processes, because they
inherit those mappings from the postmaster. However, additional things
can get mapped into the address space of either process later. This
can happen in a variety of ways. For instance, a shared library can
get loaded into one process and not the other. Or it can get loaded
into both processes but at different addresses - keep in mind that
it's the OS, not PostgreSQL, that decides what address to use when
loading a shared library. Or, if one process allocates a bunch of
memory, then new address space will have to be mapped into that
process to handle those memory allocations and, again, it is the OS
that decides where to put those mappings. So over time the memory
mappings of these two processes can diverge arbitrarily. That means
that if the same DSM has to be mapped into both processes, there is no
guarantee that it can be placed at the same address in both processes.
The address that gets used in one process might not be available in
the other process.

It's worth pointing out here that there are no portable primitives
available for a process to examine what memory segments are mapped
into its address space. I think it's probably possible on every OS,
but it works differently on different ones. Linux exposes such details
through /proc, for example, but macOS doesn't have /proc. So if we're
using standard, portable primitives, we can't even TRY to put the DSM
at the same address in every process that maps it. But even if we used
non-portable primitives to examine what's mapped into the address
space of every process, it wouldn't solve the problem. Suppose 4
processes want to share a DSM, so they all run around and use
non-portable OS-specific interfaces to figure out where there's a free
chunk of address space large enough to accommodate that DSM and they
all map it there. Hooray! But then say a fifth process comes along and
it ALSO wants to map that DSM, but in that fifth process the address
space that was available in the other four processes has already been
used by something else. Well, now we're screwed.

The fact that DSM is expensive and awkward to use isn't a defect in
the implementation of DSM. It's a consequence of the fact that the
address space mappings in one PostgreSQL backend can be almost
arbitrarily different from the address space mappings in another
PostgreSQL backend. If only there were some kind of OS feature
available that would allow us to set things up so that all of the
PostgreSQL backends shared the same address space mappings!

Oh, right, there is: THREADS.

The fact that we don't use threads is the reason why DSM sucks and has
to suck. In fact it's the reason why DSM has to exist at all. Saying
"fix DSM instead of using threads" is roughly in the same category as
saying "if the peasants are revolting because they have no bread, then
let them eat cake." Both statements evince a complete failure to
understand the actual source of the problem.

With apologies for my grumpiness,

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Let's make PostgreSQL multi-threaded

2023-06-08 Thread Greg Stark
On Wed, 7 Jun 2023 at 18:09, Andres Freund  wrote:
> Having the same memory mapping between threads makes allows the
> hardware to share the TLB (on x86 via process context identifiers), which
> isn't realistically possible with different processes.

As a matter of historical interest Solaris actually did implement this
across different processes. It was called by the somewhat unfortunate
name "Intimate Shared Memory". I don't think Linux ever implemented
anything like it but I'm not sure.

I think this was not so much about cache hit rate but about just sheer
wasted memory in page mappings. So I guess hugepages more or less
target the same issues. But I find it interesting that they were
already running into issues like this 20 years ago -- presumably those
issues have only grown.

-- 
greg




Re: Let's make PostgreSQL multi-threaded

2023-06-08 Thread Robert Haas
On Wed, Jun 7, 2023 at 5:45 PM Andres Freund  wrote:
> People have argued that the process model is more robust. But it turns out
> that we have to crash-restart for just about any "bad failure" anyway. It used
> to be (a long time ago) that we didn't, but that was just broken.

How hard have you thought about memory leaks as a failure mode? Or
file descriptor leaks?

Right now, a process needs to release all of its shared resources
before exiting, or trigger a crash-and-restart cycle. But it doesn't
need to release any process-local resources, because the OS will take
care of that. But that wouldn't be true any more, and that seems like
it might require fixing quite a few things.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Let's make PostgreSQL multi-threaded

2023-06-08 Thread Robert Haas
On Wed, Jun 7, 2023 at 5:39 PM Peter Eisentraut  wrote:
> On 07.06.23 23:30, Andres Freund wrote:
> > Yea, we definitely need the supervisor function in a separate
> > process. Presumably that means we need to split off some of the postmaster
> > responsibilities - e.g. I don't think it'd make sense to handle connection
> > establishment in the supervisor process. I wonder if this is something that
> > could end up being beneficial even in the process world.
>
> Something to think about perhaps ... how would that be different from
> using an existing external supervisor process like systemd or supervisord.

systemd wouldn't start individual PostgreSQL processes, right? If we
want a checkpointer and a wal writer and a background writer and
whatever we have to have our own supervisor process to spawn all those
and keep them running. We could remove the logic to do a full system
reset without a postmaster exit in favor of letting systemd restart
everything from scratch, if we wanted to do that. But we'd still need
our own supervisor to start up all of the individual threads/processes
that we need.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Let's make PostgreSQL multi-threaded

2023-06-08 Thread Robert Haas
On Wed, Jun 7, 2023 at 5:37 PM Andres Freund  wrote:
> I think we're starting to hit quite a few limits related to the process model,
> particularly on bigger machines. The overhead of cross-process context
> switches is inherently higher than switching between threads in the same
> process - and my suspicion is that that overhead will continue to
> increase. Once you have a significant number of connections we end up spending
> a *lot* of time in TLB misses, and that's inherent to the process model,
> because you can't share the TLB across processes.

This is a very good point.

Our default posture on this mailing list is to try to maximize use of
OS facilities rather than reimplementing things - well and good. But
if a user writes a query with FOO JOIN BAR ON FOO.X = BAR.X OR FOO.Y =
BAR.Y and then complains that the resulting query plan sucks, we don't
slink off in embarrassment: we tell the user that there's not really
any fast plan for that query and that if they write queries like that
they have to live with the consequences. But the same thing applies
here. To the extent that context switching between more processes is
more expensive than context switching between threads for
hardware-related reasons, that's not something that the OS can fix for
us. If we choose to do the expensive thing then we pay the overhead.

--
Robert Haas
EDB: http://www.enterprisedb.com




Re: Let's make PostgreSQL multi-threaded

2023-06-08 Thread Robert Haas
On Wed, Jun 7, 2023 at 5:30 PM Andres Freund  wrote:
> On 2023-06-05 17:51:57 +0300, Heikki Linnakangas wrote:
> > If there are no major objections, I'm going to update the developer FAQ,
> > removing the excuses there for why we don't use threads [1].
>
> I think we should do this even if there's no concensus to slowly change to
> threads. There's clearly no concensus on the opposite either.

This is a very fair point.

> One interesting bit around the transition is what tooling we ought to provide
> to detect problems. It could e.g. be reasonably feasible to write something
> checking how many read-write global variables an extension has on linux
> systems.

Yes, this would be great.

> I don't think the control file is the right place - that seems more like
> something that should be signalled via PG_MODULE_MAGIC. We need to check this
> not just during CREATE EXTENSION, but also during loading of libraries - think
> of shared_preload_libraries.

+1.

> Yea, we definitely need the supervisor function in a separate
> process. Presumably that means we need to split off some of the postmaster
> responsibilities - e.g. I don't think it'd make sense to handle connection
> establishment in the supervisor process. I wonder if this is something that
> could end up being beneficial even in the process world.

Yeah, I've had similar thoughts. I'm not exactly sure what the
advantages of such a refactoring might be, but the current structure
feels pretty limiting. It works OK because we don't do anything in the
postmaster other than fork a new backend, but I'm not sure if that's
the best strategy. It means, for example, that if there's a ton of new
connection requests, we're spawning a ton of new processes, which
means that you can put a lot of load on a PostgreSQL instance even if
you can't authenticate. Maybe we'd be better off with a pool of
processes accepting connections; if authentication fails, that
connection goes back into the pool and tries again. If authentication
succeeds, either that process transitions to being a regular backend,
leaving the authentication pool, or perhaps hands off the connection
to a "real backend" at that point and loops around to accept() the
next request.

Whether that's a good ideal in detail or not, the point remains that
having the postmaster handle this task is quite limiting. It forces us
to hand off the connection to a new process at the earliest possible
stage, so that the postmaster remains free to handle other duties.
Giving the responsibility to another process would let us make
decisions about where to perform the hand-off based on real
architectural thought rather than being forced to do a certain way
because nothing else will work.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Let's make PostgreSQL multi-threaded

2023-06-08 Thread Konstantin Knizhnik




On 07.06.2023 3:53 PM, Robert Haas wrote:

I think I remember a previous conversation with Andres
where he opined that thread-local variables are "really expensive"
(and I apologize in advance if I'm mis-remembering this). Now, Andres
is not a man who accepts a tax on performance of any size without a
fight, so his "really expensive" might turn out to resemble my "pretty
cheap." However, if widespread use of TLS is too expensive and we have
to start rewriting code to not depend on global variables, that's
going to be more of a problem. If we can get by with doing such
rewrites only in performance-critical places, it might not still be
too bad. Personally, I think the degree of dependence that PostgreSQL
has on global variables is pretty excessive and I don't think that a
certain amount of refactoring to reduce it would be a bad thing. If it
turns into an infinite series of hastily-written patches to rejigger
every source file we have, though, then I'm not really on board with
that.


Actually TLS not not more expensive then accessing struct fields (at 
least at x86 platform), consider the following program:


typedef struct {
    int a;
    int b;
    int c;
} ABC;

__thread int a;
__thread int b;
__thread int c;


void use_struct(ABC* abc) {
    abc->a += 1;
    abc->b += 1;
    abc->c += 1;
}

void use_tls(ABC* abc) {
    a += 1;
    b += 1;
    c += 1;
}


Now look at the generated assembler:

use_struct:
    addl    $1, (%rdi)
    addl    $1, 4(%rdi)
    addl    $1, 8(%rdi)
    ret


use_tls:
    addl    $1, %fs:a@tpoff
    addl    $1, %fs:b@tpoff
    addl    $1, %fs:c@tpoff
    ret


Heikki mentions the idea of having a central Session object and just
passing that around. I have a hard time believing that's going to work
out nicely. First, it's not extensible. Right now, if you need a bit
of additional session-local state, you just declare a variable and
you're all set. That's not a perfect system and does cause some
problems, but we can't go from there to a system where it's impossible
to add session-local state without hacking core. Second, we will be
sad if session.h ends up #including every other header file that
defines a data structure anywhere in the backend. Or at least I'll be
sad. I'm not actually against the idea of having some kind of session
object that we pass around, but I think it either needs to be limited
to a relatively small set of well-defined things, or else it needs to
be design in some kind of extensible way that doesn't require it to
know the full details of every sort of object that's being used as
session-local state anywhere in the system. I haven't really seen any
convincing design ideas around this yet.



There are about 2k static/global variables in Postgres.
It is almost impossible to maintain such struct.
But session context may be still needed for other purposes - if we want 
to support built-in connection pool.


If we are using threads, then all variables needs to be either 
thread-local, either access to them should be synchronized.
But If we want to save session context, then there is no need to 
save/restore all this 2k variables.
We need to capture and these variables which lifetime  exceeds 
transaction boundary.

There are not so much such variables - tens not hundreds.

The question is how to better handle this "session context".
There are two alternatives:
1. Save/restore this context from/to normal TLS variables.
2. Replace such variables with access through the session context struct.

I prefer 2) because it requires less changes in code.
And performance overhead of session context store/resume is negligible 
when number of such variables is ~10.







Re: Let's make PostgreSQL multi-threaded

2023-06-08 Thread Robert Haas
On Thu, Jun 8, 2023 at 6:04 AM Hannu Krosing  wrote:
> Here I was hoping to go in the opposite direction and support parallel
> query across replicas.
>
> This looks much more doable based on the process model than the single
> process / multiple threads model.

I don't think this is any more or less difficult to support in one
model vs. the other. The problems seem pretty much unrelated.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Let's make PostgreSQL multi-threaded

2023-06-08 Thread Hannu Krosing
On Thu, Jun 8, 2023 at 2:15 PM Matthias van de Meent
 wrote:
>
> On Thu, 8 Jun 2023 at 11:54, Hannu Krosing  wrote:
> >
> > On Wed, Jun 7, 2023 at 11:37 PM Andres Freund  wrote:
> > >
> > > Hi,
> > >
> > > On 2023-06-05 13:40:13 -0400, Jonathan S. Katz wrote:
> > > > 2. While I wouldn't want to necessarily discourage a moonshot effort, I
> > > > would ask if developer time could be better spent on tackling some of 
> > > > the
> > > > other problems around vertical scalability? Per some PGCon discussions,
> > > > there's still room for improvement in how PostgreSQL can best utilize
> > > > resources available very large "commodity" machines (a 448-core / 24TB 
> > > > RAM
> > > > instance comes to mind).
> > >
> > > I think we're starting to hit quite a few limits related to the process 
> > > model,
> > > particularly on bigger machines. The overhead of cross-process context
> > > switches is inherently higher than switching between threads in the same
> > > process - and my suspicion is that that overhead will continue to
> > > increase. Once you have a significant number of connections we end up 
> > > spending
> > > a *lot* of time in TLB misses, and that's inherent to the process model,
> > > because you can't share the TLB across processes.
> >
> >
> > This part was touched in the "AMA with a Linux Kernale Hacker"
> > Unconference session where he mentioned that the had proposed a
> > 'mshare' syscall for this.
> >
> > So maybe a more fruitful way to fixing the perceived issues with
> > process model is to push for small changes in Linux to overcome these
> > avoiding a wholesale rewrite ?
>
> We support not just Linux, but also Windows and several (?) BSDs. I'm
> not against pushing Linux to make things easier for us, but Linux is
> an open source project, too, where someone need to put in time to get
> the shiny things that you want. And I'd rather see our time spent in
> PostgreSQL, as Linux is only used by a part of our user base.

Do we have any statistics for the distribution of our user base ?

My gut feeling says that for performance-critical use the non-Linux is
in low single digits at best.

My fascination for OpenSource started with realisation that instead of
workarounds you can actually fix the problem at source. So if the
specific problem is that TLB is not shared then the proper fix is
making it shared instead of rewriting everything else to get around
it. None of us is limited to writing code in PostgreSQL only. If the
easiest and more generix fix can be done in Linux then so be it.

It is also possible that Windows and *BSD already have a similar feature.

>
> > > The amount of duplicated code we have to deal with due to to the process 
> > > model
> > > is quite substantial. We have local memory, statically allocated shared 
> > > memory
> > > and dynamically allocated shared memory variants for some things. And 
> > > that's
> > > just going to continue.
> >
> > Maybe we can already remove the distinction between static and dynamic
> > shared memory ?
>
> That sounds like a bad idea, dynamic shared memory is more expensive
> to maintain than our static shared memory systems, not in the least
> because DSM is not guaranteed to share the same addresses in each
> process' address space.

Then this too needs to be fixed

>
> > Though I already heard some complaints at the conference discussions
> > that having the dynamic version available has made some developers
> > sloppy in using it resulting in wastefulness.
>
> Do you know any examples of this wastefulness?

No. Just somebody mentioned it in a hallway conversation and the rest
of the developers present mumbled approvingly :)

> > > > I'm purposely giving a nonanswer on whether it's a worthwhile goal, but
> > > > rather I'd be curious where it could stack up against some other 
> > > > efforts to
> > > > continue to help PostgreSQL improve performance and handle very large
> > > > workloads.
> > >
> > > There's plenty of things we can do before, but in the end I think 
> > > tackling the
> > > issues you mention and moving to threads are quite tightly linked.
> >
> > Still we should be focusing our attention at solving the issues and
> > not at "moving to threads" and hoping this will fix the issues by
> > itself.
>
> I suspect that it is much easier to solve some of the issues when
> working in a shared address space.

Probably. But it would come at the cost of needing to change a lot of
other parts of PostgreSQL.

I am not against making code cleaner for potential threaded model
support. I am just a bit sceptical about the actual switch being easy,
or doable in the next 10-15 years.

> E.g. resizing shared_buffers is difficult right now due to the use of
> a static allocation of shared memory, but if we had access to a single
> shared address space, it'd be easier to do any cleanup necessary for
> dynamically increasing/decreasing its size.

This again could be done with shared memory mapping + dynamic shared memory.

> Same 

Re: Let's make PostgreSQL multi-threaded

2023-06-08 Thread Matthias van de Meent
On Thu, 8 Jun 2023 at 11:54, Hannu Krosing  wrote:
>
> On Wed, Jun 7, 2023 at 11:37 PM Andres Freund  wrote:
> >
> > Hi,
> >
> > On 2023-06-05 13:40:13 -0400, Jonathan S. Katz wrote:
> > > 2. While I wouldn't want to necessarily discourage a moonshot effort, I
> > > would ask if developer time could be better spent on tackling some of the
> > > other problems around vertical scalability? Per some PGCon discussions,
> > > there's still room for improvement in how PostgreSQL can best utilize
> > > resources available very large "commodity" machines (a 448-core / 24TB RAM
> > > instance comes to mind).
> >
> > I think we're starting to hit quite a few limits related to the process 
> > model,
> > particularly on bigger machines. The overhead of cross-process context
> > switches is inherently higher than switching between threads in the same
> > process - and my suspicion is that that overhead will continue to
> > increase. Once you have a significant number of connections we end up 
> > spending
> > a *lot* of time in TLB misses, and that's inherent to the process model,
> > because you can't share the TLB across processes.
>
>
> This part was touched in the "AMA with a Linux Kernale Hacker"
> Unconference session where he mentioned that the had proposed a
> 'mshare' syscall for this.
>
> So maybe a more fruitful way to fixing the perceived issues with
> process model is to push for small changes in Linux to overcome these
> avoiding a wholesale rewrite ?

We support not just Linux, but also Windows and several (?) BSDs. I'm
not against pushing Linux to make things easier for us, but Linux is
an open source project, too, where someone need to put in time to get
the shiny things that you want. And I'd rather see our time spent in
PostgreSQL, as Linux is only used by a part of our user base.

> > The amount of duplicated code we have to deal with due to to the process 
> > model
> > is quite substantial. We have local memory, statically allocated shared 
> > memory
> > and dynamically allocated shared memory variants for some things. And that's
> > just going to continue.
>
> Maybe we can already remove the distinction between static and dynamic
> shared memory ?

That sounds like a bad idea, dynamic shared memory is more expensive
to maintain than our static shared memory systems, not in the least
because DSM is not guaranteed to share the same addresses in each
process' address space.

> Though I already heard some complaints at the conference discussions
> that having the dynamic version available has made some developers
> sloppy in using it resulting in wastefulness.

Do you know any examples of this wastefulness?

> > > I'm purposely giving a nonanswer on whether it's a worthwhile goal, but
> > > rather I'd be curious where it could stack up against some other efforts 
> > > to
> > > continue to help PostgreSQL improve performance and handle very large
> > > workloads.
> >
> > There's plenty of things we can do before, but in the end I think tackling 
> > the
> > issues you mention and moving to threads are quite tightly linked.
>
> Still we should be focusing our attention at solving the issues and
> not at "moving to threads" and hoping this will fix the issues by
> itself.

I suspect that it is much easier to solve some of the issues when
working in a shared address space.
E.g. resizing shared_buffers is difficult right now due to the use of
a static allocation of shared memory, but if we had access to a single
shared address space, it'd be easier to do any cleanup necessary for
dynamically increasing/decreasing its size.
Same with parallel workers - if we have a shared address space, the
workers can pass any sized objects around without being required to
move the tuples through DSM and waiting for the leader process to
empty that buffer when it gets full.

Sure, most of that is probably possible with DSM as well, it's just
that I see a lot more issues that you need to take care of when you
don't have a shared address space (such as the pointer translation we
do in dsa_get_address).

Kind regards,

Matthias van de Meent
Neon, Inc.




Re: Let's make PostgreSQL multi-threaded

2023-06-08 Thread Jose Luis Tallon

On 7/6/23 23:37, Andres Freund wrote:

[snip]
I think we're starting to hit quite a few limits related to the process model,
particularly on bigger machines. The overhead of cross-process context
switches is inherently higher than switching between threads in the same
process - and my suspicion is that that overhead will continue to
increase. Once you have a significant number of connections we end up spending
a *lot* of time in TLB misses, and that's inherent to the process model,
because you can't share the TLB across processes.


IMHO, as one sysadmin who has previously played with Postgres on "quite 
large" machines, I'd propose what most would call a "hybrid model"


* Threads are a very valuable addition for the "frontend" of the server. 
Most would call this a built-in session-aware connection pooler :)


    Heikki's (and others') efforts towards separating connection state 
into discrete structs is clearly a prerequisite for this; 
Implementation-wise, just toss the connState into a TLS[thread-local 
storage] variable and many problems just vanish.


    Postgres wouldn't be the first to adopt this approach, either...

* For "heavyweight" queries, the scalability of "almost independent" 
processes w.r.t. NUMA is just _impossible to achieve_ (locality of 
reference!) with a pure threaded system. When CPU+mem-bound 
(bandwidth-wise), threads add nothing IMO.


Indeed a separate postmaster is very much needed in order to control the 
processes / guard overall integrity.



Hence, my humble suggestion is to consider a hybrid architecture which 
benefits from each model's strengths. I am quite convinced that 
transition would be much safer and simpler (I do share most of Tom and 
other's concerns...)


Other projects to draw inspiration from:

 * Postfix -- multi-process, postfix's master guards processes and 
performs privileged operations; unprivileged "subsystems". Interesting 
IPC solutions
 * Apache -- MPMs provide flexibility and support for e.g. non-threaded 
workloads (PHP is the most popular; cfr. "prefork" multi-process MPM)
 * NginX is actually multi-process (one per CPU) + event-based 
(multiplexing) ...
 * PowerDNS is internally threaded, but has a "guardian" process. Seems 
to be evolving to a more hybrid model.



I would suggest something along the lines of :

* postmaster -- process supervision and (potentially privileged) 
operations; process coordination (i.e descriptor passing); mostly as-is

* *frontend* -- connection/session handling; possibly even event-driven
* backends -- process heavyweight queries as independently as possible. 
Can span worker threads AND processes when needed
* *dispatcher* -- takes care of cached/lightweight queries (cached 
catalog / full snapshot visibility+processing)
* utility processes can be left "as is" mostly, except to be made 
multi-threaded for heavy-sync ones (e.g. vacuum workers, stat workers)


For fixed-size buffers, i.e. pages / chunks, I'd say mmaped (anonymous) 
shared memory isn't that bad... but haven't read the actual code in years.


For message queues / invalidation messages, i guess that shmem-based 
sync is really a nuisance. My understanding is that Linux-specific (i.e. 
eventfd) mechanisms aren't quite considered .. or are they?



The amount of duplicated code we have to deal with due to to the process model
is quite substantial. We have local memory, statically allocated shared memory
and dynamically allocated shared memory variants for some things. And that's
just going to continue.


Code duplication is indeed a problem... but I wouldn't call "different 
approaches/solution for very similar problems depending on 
context/requirement" a duplicate. I might well be wrong / lack detail, 
though... (again: haven't read PG's code for some years already).



Just my two cents.


Thanks,

    J.L.

--
Parkinson's Law: Work expands to fill the time alloted to it.


Re: Let's make PostgreSQL multi-threaded

2023-06-08 Thread Andrew Dunstan


On 2023-06-07 We 17:58, Andres Freund wrote:

Hi,

On 2023-06-07 08:53:24 -0400, Robert Haas wrote:

Now, Andres is not a man who accepts a tax on performance of any size
without a fight, so his "really expensive" might turn out to resemble my
"pretty cheap." However, if widespread use of TLS is too expensive and we
have to start rewriting code to not depend on global variables, that's going
to be more of a problem. If we can get by with doing such rewrites only in
performance-critical places, it might not still be too bad. Personally, I
think the degree of dependence that PostgreSQL has on global variables is
pretty excessive and I don't think that a certain amount of refactoring to
reduce it would be a bad thing. If it turns into an infinite series of
hastily-written patches to rejigger every source file we have, though, then
I'm not really on board with that.

I think a lot of such rewrites would be a good idea, even if we right now all
agree to swear we'll never go to threads.  Not having any sort of grouping of
global variables makes it IMO considerably harder to debug. I can easily ask
somebody to print out a variable pointing to a struct describing the state of
a subsystem. I can't really do that for 50 variables.

And once you do that, I think you reduce the TLS cost substantially. The
variable pointing to the struct is already likely in a register. Whereas each
individual variable being in TLS makes the job harder for the compiler.



I could certainly get on board with a project to tame the use of global 
variables.



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: Let's make PostgreSQL multi-threaded

2023-06-08 Thread Tomas Vondra



On 6/8/23 01:37, Thomas Munro wrote:
> On Thu, Jun 8, 2023 at 10:37 AM Jeremy Schneider
>  wrote:
>> On 6/7/23 2:39 PM, Thomas Kellerer wrote:
>>> Tomas Vondra schrieb am 07.06.2023 um 21:20:
 Also, which other projects did this transition? Is there something we
 could learn from them? Were they restricted to much smaller list of
 platforms?
>>>
>>> Not open source, but Oracle was historically multi-threaded on Windows
>>> and multi-process on all other platforms.
>>> I _think_ starting with 19c you can optionally run it multi-threaded on
>>> Linux as well.
>> Looks like it actually became publicly available in 12c. AFAICT Oracle
>> supports both modes today, with a config parameter to switch between them.
> 
> It's old, but this describes the 4 main models and which well known
> RDBMSes use them in section 2.3:
> 
> https://dsf.berkeley.edu/papers/fntdb07-architecture.pdf
> 
> TL;DR DB2 is the winner, it can do process-per-connection,
> thread-per-connection, process-pool or thread-pool.
> 

I think the basic architectures are known, especially from the user
perspective. I'm more interested in challenges the projects faced while
moving from one architecture to the other, or how / why they support
more than just one, etc.

In [1] Heikki argued that:

I don't think this is worth it, unless we plan to eventually remove
the multi-process mode. ... As long as you need to also support
processes, you need to code to the lowest common denominator and
don't really get the benefits.

But these projects clearly support multiple architectures, and have no
intention to ditch some of them. So how did they do that? Surely they
think there are benefits.

One option would be to just have separate code paths for processes and
threads, but the effort required to maintain and improve that would be
deadly. So the only feasible option seems to be they managed to abstract
the subsystems enough for the "regular" code to not care about model.


[1]
https://www.postgresql.org/message-id/6e3082dc-ff29-9cbf-847e-5f570828b...@iki.fi

> I understand this thread to be about thread-per-connection (= backend,
> session, socket) for now.

Maybe, although people also proposed to switch the parallel query to
threads (so that'd be multiple threads per session). But I don't think
it really matters, the concerns are mostly about moving from one
architecture to another and/or supporting both.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Let's make PostgreSQL multi-threaded

2023-06-08 Thread Hannu Krosing
On Thu, Jun 8, 2023 at 11:54 AM Hannu Krosing  wrote:
>
> On Wed, Jun 7, 2023 at 11:37 PM Andres Freund  wrote:
> >
> > Hi,
> >
> > On 2023-06-05 13:40:13 -0400, Jonathan S. Katz wrote:
> > > 2. While I wouldn't want to necessarily discourage a moonshot effort, I
> > > would ask if developer time could be better spent on tackling some of the
> > > other problems around vertical scalability? Per some PGCon discussions,
> > > there's still room for improvement in how PostgreSQL can best utilize
> > > resources available very large "commodity" machines (a 448-core / 24TB RAM
> > > instance comes to mind).
> >
> > I think we're starting to hit quite a few limits related to the process 
> > model,
> > particularly on bigger machines. The overhead of cross-process context
> > switches is inherently higher than switching between threads in the same
> > process - and my suspicion is that that overhead will continue to
> > increase. Once you have a significant number of connections we end up 
> > spending
> > a *lot* of time in TLB misses, and that's inherent to the process model,
> > because you can't share the TLB across processes.
>
>
> This part was touched in the "AMA with a Linux Kernale Hacker"
> Unconference session where he mentioned that the had proposed a
> 'mshare' syscall for this.

Also, the *static* huge pages already let you solve this problem now
by sharing the page tables


Cheers
Hannu




Re: Let's make PostgreSQL multi-threaded

2023-06-08 Thread Hannu Krosing
On Thu, Jun 8, 2023 at 12:09 AM Andres Freund  wrote:
...

> We could e.g. eventually decide that we
> don't support parallel query without threading support - which would allow us
> to get rid of a very significant amount of code and runtime overhead.

Here I was hoping to go in the opposite direction and support parallel
query across replicas.

This looks much more doable based on the process model than the single
process / multiple threads model.

---
Cheers
Hannu




Re: Let's make PostgreSQL multi-threaded

2023-06-08 Thread Hannu Krosing
I think I remember that in the early days of development somebody did
send a patch-set for making PostgreSQL threaded on Solaris.

I don't remember why this did not catch on.

On Wed, Jun 7, 2023 at 11:40 PM Thomas Kellerer  wrote:
>
> Tomas Vondra schrieb am 07.06.2023 um 21:20:
> > Also, which other projects did this transition? Is there something we
> > could learn from them? Were they restricted to much smaller list of
> > platforms?
>
> Firebird did this a while ago if I'm not mistaken.
>
> Not open source, but Oracle was historically multi-threaded on Windows and 
> multi-process on all other platforms.
> I _think_ starting with 19c you can optionally run it multi-threaded on Linux 
> as well.
>
> But I doubt, they are willing to share any insights ;)
>
>
>




Re: Let's make PostgreSQL multi-threaded

2023-06-08 Thread Hannu Krosing
On Wed, Jun 7, 2023 at 11:37 PM Andres Freund  wrote:
>
> Hi,
>
> On 2023-06-05 13:40:13 -0400, Jonathan S. Katz wrote:
> > 2. While I wouldn't want to necessarily discourage a moonshot effort, I
> > would ask if developer time could be better spent on tackling some of the
> > other problems around vertical scalability? Per some PGCon discussions,
> > there's still room for improvement in how PostgreSQL can best utilize
> > resources available very large "commodity" machines (a 448-core / 24TB RAM
> > instance comes to mind).
>
> I think we're starting to hit quite a few limits related to the process model,
> particularly on bigger machines. The overhead of cross-process context
> switches is inherently higher than switching between threads in the same
> process - and my suspicion is that that overhead will continue to
> increase. Once you have a significant number of connections we end up spending
> a *lot* of time in TLB misses, and that's inherent to the process model,
> because you can't share the TLB across processes.


This part was touched in the "AMA with a Linux Kernale Hacker"
Unconference session where he mentioned that the had proposed a
'mshare' syscall for this.

So maybe a more fruitful way to fixing the perceived issues with
process model is to push for small changes in Linux to overcome these
avoiding a wholesale rewrite ?

>
>
> The amount of duplicated code we have to deal with due to to the process model
> is quite substantial. We have local memory, statically allocated shared memory
> and dynamically allocated shared memory variants for some things. And that's
> just going to continue.

Maybe we can already remove the distinction between static and dynamic
shared memory ?

Though I already heard some complaints at the conference discussions
that having the dynamic version available has made some developers
sloppy in using it resulting in wastefulness.

>
>
> > I'm purposely giving a nonanswer on whether it's a worthwhile goal, but
> > rather I'd be curious where it could stack up against some other efforts to
> > continue to help PostgreSQL improve performance and handle very large
> > workloads.
>
> There's plenty of things we can do before, but in the end I think tackling the
> issues you mention and moving to threads are quite tightly linked.

Still we should be focusing our attention at solving the issues and
not at "moving to threads" and hoping this will fix the issues by
itself.

Cheers
Hannu




Re: Let's make PostgreSQL multi-threaded

2023-06-07 Thread Drouvot, Bertrand

Hi,

On 6/8/23 12:37 AM, Jeremy Schneider wrote:

On 6/7/23 2:39 PM, Thomas Kellerer wrote:

Tomas Vondra schrieb am 07.06.2023 um 21:20:


I did google search for "oracle threaded_execution" and browsed a bit;
didn't see anything that seems earth shattering so far.


FWIW, I recall Karl Arao's wiki page: 
https://karlarao.github.io/karlaraowiki/#%2212c%20threaded_execution%22
where some performance and memory consumption studies have been done.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Let's make PostgreSQL multi-threaded

2023-06-07 Thread Dilip Kumar
On Thu, Jun 8, 2023 at 3:00 AM Andres Freund  wrote:
>

> Yea, we definitely need the supervisor function in a separate
> process. Presumably that means we need to split off some of the postmaster
> responsibilities - e.g. I don't think it'd make sense to handle connection
> establishment in the supervisor process. I wonder if this is something that
> could end up being beneficial even in the process world.
>
> A related issue is that we won't get SIGCHLD in the supervisor process
> anymore. So we'd need to come up with some design for that.

If we fork the main Postgres process from the supervisor process then
any exit to the main process will send SIGCHLD in the supervisor
process, right?  I agree we can handle all connection establishment
and other thread-related stuff in the main Postgres process.  But I
assume this main process should be forked out of the supervisor
process.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Let's make PostgreSQL multi-threaded

2023-06-07 Thread Thomas Munro
On Thu, Jun 8, 2023 at 10:37 AM Jeremy Schneider
 wrote:
> On 6/7/23 2:39 PM, Thomas Kellerer wrote:
> > Tomas Vondra schrieb am 07.06.2023 um 21:20:
> >> Also, which other projects did this transition? Is there something we
> >> could learn from them? Were they restricted to much smaller list of
> >> platforms?
> >
> > Not open source, but Oracle was historically multi-threaded on Windows
> > and multi-process on all other platforms.
> > I _think_ starting with 19c you can optionally run it multi-threaded on
> > Linux as well.
> Looks like it actually became publicly available in 12c. AFAICT Oracle
> supports both modes today, with a config parameter to switch between them.

It's old, but this describes the 4 main models and which well known
RDBMSes use them in section 2.3:

https://dsf.berkeley.edu/papers/fntdb07-architecture.pdf

TL;DR DB2 is the winner, it can do process-per-connection,
thread-per-connection, process-pool or thread-pool.

I understand this thread to be about thread-per-connection (= backend,
session, socket) for now.




Re: Let's make PostgreSQL multi-threaded

2023-06-07 Thread Jeremy Schneider
On 6/7/23 2:39 PM, Thomas Kellerer wrote:
> Tomas Vondra schrieb am 07.06.2023 um 21:20:
>> Also, which other projects did this transition? Is there something we
>> could learn from them? Were they restricted to much smaller list of
>> platforms?
> 
> Not open source, but Oracle was historically multi-threaded on Windows
> and multi-process on all other platforms.
> I _think_ starting with 19c you can optionally run it multi-threaded on
> Linux as well.
Looks like it actually became publicly available in 12c. AFAICT Oracle
supports both modes today, with a config parameter to switch between them.

This is a very interesting case study.

Concepts Manual:

https://docs.oracle.com/en/database/oracle/oracle-database/23/cncpt/process-architecture.html#GUID-4B460E97-18A0-4F5A-A62F-9608FFD43664

Reference:

https://docs.oracle.com/en/database/oracle/oracle-database/23/refrn/THREADED_EXECUTION.html#GUID-7A668A49-9FC5-4245-AD27-10D90E5AE8A8

List of Oracle process types, which ones can run as threads and which
ones always run as processes:

https://docs.oracle.com/en/database/oracle/oracle-database/23/refrn/background-processes.html#GUID-86184690-5531-405F-AA05-BB935F57B76D

Looks like they have four processes that will never run in threads:
* dbwriter (writes dirty blocks in background)
* process monitor (cleanup after process crash to avoid full server
restarts) 
* process spawner (like postmaster)
* time keeper process

Per Tim Hall's oracle-base, it seems that plenty of people are sticking
with the process model, and that one use case for threads was:
"consolidating lots of instances onto a single server without using the
multitennant option. Without the multithreaded model, the number of OS
processes could get very high."

https://oracle-base.com/articles/12c/multithreaded-model-using-threaded_execution_12cr1

I did google search for "oracle threaded_execution" and browsed a bit;
didn't see anything that seems earth shattering so far.

Ludovico Caldara and Martin Bach published blogs when it was first
released, which just introduced but didn't test or hammer on it. The
feature has existed for 10 years now and I don't see any blog posts
saying that "everyone should use this because it doubles your
performance" or anything like that. I think if there were really
significant performance gains then there would be many interesting blog
posts on the internet by now from the independent Oracle professional
community - I know many of these people.

In fact, there's an interesting blog by Kamil Stawiarski from 2015 where
he actually observed one case of /slower/ performance with threads. That
blog post ends with: "So I raise the question: why and when use threaded
execution? If ever?"

https://blog.ora-600.pl/2015/12/17/oracle-12c-internals-of-threaded-execution/

I'm not sure if he ever got an answer

-Jeremy

-- 
http://about.me/jeremy_schneider





Re: Let's make PostgreSQL multi-threaded

2023-06-07 Thread Andres Freund
Hi,

On 2023-06-06 16:14:41 -0400, Greg Stark wrote:
> I think of processes and threads as fundamentally the same things,
> just a slightly different API -- namely that in one memory is by
> default unshared and needs to be explicitly shared and in the other
> it's default shared and needs to be explicitly unshared.

In theory that's true, in practice it's entirely wrong.

For one, the amount of complexity you need to deal with to share state across
processes, post fork, is *substantial*.  You can share file descriptors across
processes, but it's extremely platform dependant, requires cooperation between
both processes etc.  You can share memory allocations made after the processes
forked, but you're typically not going to be able to guarantee they're at the
same pointer values. Etc.

But more importantly, there's crucial performance differences between threads
and processes. Having the same memory mapping between threads makes allows the
hardware to share the TLB (on x86 via process context identifiers), which
isn't realistically possible with different processes.


> However all else is not equal. The discussion in the hallway turned to
> whether we could just use pthread primitives like mutexes and
> condition variables instead of our own locks -- and the point was
> raised that those libraries assume these objects will be in threads of
> one process not shared across completely different processes.

Independent of threads vs processes, I am -many on using pthread mutexes and
condition variables. From experiments, that *looses* performance, and we loose
a lot of control and increase cross-platform behavioural differences.  I also
don't see any benefit in going in that direction.


> And that's probably not the only library we're stuck reimplementing
> because of this. So the question is are these things worth taking the
> risk of having data structures shared implicitly and having unclear
> ownership rules?
> 
> I was going to say supporting both modes relieves that fear since it
> would force that extra discipline and allow testing under the more
> restrictive rule. However I don't think that will actually work. As
> long as we support both modes we lose all the advantages of threads.

I don't think that has to be true. We could e.g. eventually decide that we
don't support parallel query without threading support - which would allow us
to get rid of a very significant amount of code and runtime overhead.

Greetings,

Andres Freund




Re: Let's make PostgreSQL multi-threaded

2023-06-07 Thread Andres Freund
Hi,

On 2023-06-07 08:53:24 -0400, Robert Haas wrote:
> In my mind, the bigger question is how much further than that do you
> have to go? I think I remember a previous conversation with Andres
> where he opined that thread-local variables are "really expensive"
> (and I apologize in advance if I'm mis-remembering this).

It really is architecture and OS dependent. I think time has reduced the cost
somewhat, due to older architectures / OSs aging out. But yea, it's not free.

I suspect that we'd gain *far* more from the higher TLB hit rate, than we'd
loose due to using many thread local variables. Even with a stupid
search-and-replace approach.

But we'd gain more if we reduced the number of thread local variables...


> Now, Andres is not a man who accepts a tax on performance of any size
> without a fight, so his "really expensive" might turn out to resemble my
> "pretty cheap." However, if widespread use of TLS is too expensive and we
> have to start rewriting code to not depend on global variables, that's going
> to be more of a problem. If we can get by with doing such rewrites only in
> performance-critical places, it might not still be too bad. Personally, I
> think the degree of dependence that PostgreSQL has on global variables is
> pretty excessive and I don't think that a certain amount of refactoring to
> reduce it would be a bad thing. If it turns into an infinite series of
> hastily-written patches to rejigger every source file we have, though, then
> I'm not really on board with that.

I think a lot of such rewrites would be a good idea, even if we right now all
agree to swear we'll never go to threads.  Not having any sort of grouping of
global variables makes it IMO considerably harder to debug. I can easily ask
somebody to print out a variable pointing to a struct describing the state of
a subsystem. I can't really do that for 50 variables.

And once you do that, I think you reduce the TLS cost substantially. The
variable pointing to the struct is already likely in a register. Whereas each
individual variable being in TLS makes the job harder for the compiler.

Greetings,

Andres Freund




Re: Let's make PostgreSQL multi-threaded

2023-06-07 Thread Andres Freund
Hi,

On 2023-06-07 23:39:01 +0200, Peter Eisentraut wrote:
> On 07.06.23 23:30, Andres Freund wrote:
> > Yea, we definitely need the supervisor function in a separate
> > process. Presumably that means we need to split off some of the postmaster
> > responsibilities - e.g. I don't think it'd make sense to handle connection
> > establishment in the supervisor process. I wonder if this is something that
> > could end up being beneficial even in the process world.
> 
> Something to think about perhaps ... how would that be different from using
> an existing external supervisor process like systemd or supervisord.

I think that's not really comparable. A postgres internal solution can
maintain resources like shared memory allocations, listening sockets, etc
across crash restarts. With something like systemd that's much harder to make
work well.  And then there's the fact that you now need to deal with much more
drastic cross-platform behavioural differences.

Greetings,

Andres Freund




Re: Let's make PostgreSQL multi-threaded

2023-06-07 Thread Andres Freund
Hi,

On 2023-06-05 20:15:56 -0400, Bruce Momjian wrote:
> Yes, sorry, critical sections is what I was remembering.  My question is
> whether all unexpected backend exits should be treated as critical
> sections?

Yes.

People have argued that the process model is more robust. But it turns out
that we have to crash-restart for just about any "bad failure" anyway. It used
to be (a long time ago) that we didn't, but that was just broken.

There are some advantages in debuggability, because it's a *tad* harder for a
bug in one process to cause another to crash, if less state is shared. But
that's by far outweighed by most debugging / validation tools not
understanding the multi-processes-with-shared-shmem model.

Greetings,

Andres Freund




Re: Let's make PostgreSQL multi-threaded

2023-06-07 Thread Thomas Kellerer

Tomas Vondra schrieb am 07.06.2023 um 21:20:

Also, which other projects did this transition? Is there something we
could learn from them? Were they restricted to much smaller list of
platforms?


Firebird did this a while ago if I'm not mistaken.

Not open source, but Oracle was historically multi-threaded on Windows and 
multi-process on all other platforms.
I _think_ starting with 19c you can optionally run it multi-threaded on Linux 
as well.

But I doubt, they are willing to share any insights ;)





Re: Let's make PostgreSQL multi-threaded

2023-06-07 Thread Peter Eisentraut

On 07.06.23 23:30, Andres Freund wrote:

Yea, we definitely need the supervisor function in a separate
process. Presumably that means we need to split off some of the postmaster
responsibilities - e.g. I don't think it'd make sense to handle connection
establishment in the supervisor process. I wonder if this is something that
could end up being beneficial even in the process world.


Something to think about perhaps ... how would that be different from 
using an existing external supervisor process like systemd or supervisord.





Re: Let's make PostgreSQL multi-threaded

2023-06-07 Thread Andres Freund
Hi,

On 2023-06-05 13:40:13 -0400, Jonathan S. Katz wrote:
> 2. While I wouldn't want to necessarily discourage a moonshot effort, I
> would ask if developer time could be better spent on tackling some of the
> other problems around vertical scalability? Per some PGCon discussions,
> there's still room for improvement in how PostgreSQL can best utilize
> resources available very large "commodity" machines (a 448-core / 24TB RAM
> instance comes to mind).

I think we're starting to hit quite a few limits related to the process model,
particularly on bigger machines. The overhead of cross-process context
switches is inherently higher than switching between threads in the same
process - and my suspicion is that that overhead will continue to
increase. Once you have a significant number of connections we end up spending
a *lot* of time in TLB misses, and that's inherent to the process model,
because you can't share the TLB across processes.


The amount of duplicated code we have to deal with due to to the process model
is quite substantial. We have local memory, statically allocated shared memory
and dynamically allocated shared memory variants for some things. And that's
just going to continue.


> I'm purposely giving a nonanswer on whether it's a worthwhile goal, but
> rather I'd be curious where it could stack up against some other efforts to
> continue to help PostgreSQL improve performance and handle very large
> workloads.

There's plenty of things we can do before, but in the end I think tackling the
issues you mention and moving to threads are quite tightly linked.


Greetings,

Andres Freund




Re: Let's make PostgreSQL multi-threaded

2023-06-07 Thread Andres Freund
Hi,

On 2023-06-05 17:51:57 +0300, Heikki Linnakangas wrote:
> If there are no major objections, I'm going to update the developer FAQ,
> removing the excuses there for why we don't use threads [1].

I think we should do this even if there's no concensus to slowly change to
threads. There's clearly no concensus on the opposite either.



> # Transition period
> 
> The transition surely cannot be done fully in one release. Even if we could
> pull it off in core, extensions will need more time to adapt. There will be
> a transition period of at least one release, probably more, where you can
> choose multi-process or multi-thread model using a GUC. Depending on how it
> goes, we can document it as experimental at first.

One interesting bit around the transition is what tooling we ought to provide
to detect problems. It could e.g. be reasonably feasible to write something
checking how many read-write global variables an extension has on linux
systems.



> # Extensions
> 
> A lot of extensions also contain global variables or other things that break
> in a multi-threaded environment. We need a way to label extensions that
> support multi-threading. And in the future, also extensions that *require* a
> multi-threaded server.
> 
> Let's add flags to the control file to mark if the extension is thread-safe
> and/or process-safe. If you try to load an extension that's not compatible
> with the server's mode, throw an error.

I don't think the control file is the right place - that seems more like
something that should be signalled via PG_MODULE_MAGIC. We need to check this
not just during CREATE EXTENSION, but also during loading of libraries - think
of shared_preload_libraries.



> # Restart on crash
> 
> If a backend process crashes, postmaster terminates all other backends and
> restarts the system. That's hard (impossible?) to do safely if everything
> runs in one process. We can continue have a separate postmaster process that
> just monitors the main process and restarts it on crash.

Yea, we definitely need the supervisor function in a separate
process. Presumably that means we need to split off some of the postmaster
responsibilities - e.g. I don't think it'd make sense to handle connection
establishment in the supervisor process. I wonder if this is something that
could end up being beneficial even in the process world.

A related issue is that we won't get SIGCHLD in the supervisor process
anymore. So we'd need to come up with some design for that.

Greetings,

Andres Freund




Re: Let's make PostgreSQL multi-threaded

2023-06-07 Thread Thomas Munro
On Thu, Jun 8, 2023 at 7:20 AM Tomas Vondra
 wrote:
> Is the platform support really there for all platforms we want/intend to
> support? I have no problem believing that for modern Linux/BSD systems,
> but what about the older stuff we currently support.

There is a conversation to be had about whether/when/how to adopt
C11/C17 threads (= same API on Windows and Unix, but sadly two
straggler systems don't have required OS support yet (macOS,
OpenBSD)), but POSIX + NT threads were all worked out in the 90s.  We
have last-mover advantage here.

> Also, which other projects did this transition? Is there something we
> could learn from them? Were they restricted to much smaller list of
> platforms?

Apache may be interesting.  Wide ecosystem of extensions.




Re: Let's make PostgreSQL multi-threaded

2023-06-07 Thread Tomas Vondra



On 6/5/23 17:33, Heikki Linnakangas wrote:
> On 05/06/2023 11:18, Tom Lane wrote:
>> Heikki Linnakangas  writes:
>>> I spoke with some folks at PGCon about making PostgreSQL multi-threaded,
>>> so that the whole server runs in a single process, with multiple
>>> threads. It has been discussed many times in the past, last thread on
>>> pgsql-hackers was back in 2017 when Konstantin made some experiments
>>> [0].
>>
>>> I feel that there is now pretty strong consensus that it would be a good
>>> thing, more so than before. Lots of work to get there, and lots of
>>> details to be hashed out, but no objections to the idea at a high level.
>>
>>> The purpose of this email is to make that silent consensus explicit. If
>>> you have objections to switching from the current multi-process
>>> architecture to a single-process, multi-threaded architecture, please
>>> speak up.
>>
>> For the record, I think this will be a disaster.  There is far too much
>> code that will get broken, largely silently, and much of it is not
>> under our control.
> 
> Noted. Other large projects have gone through this transition. It's not
> easy, but it's a lot easier now than it was 10 years ago. The platform
> and compiler support is there now, all libraries have thread-safe
> interfaces, etc.
> 

Is the platform support really there for all platforms we want/intend to
support? I have no problem believing that for modern Linux/BSD systems,
but what about the older stuff we currently support.

Also, which other projects did this transition? Is there something we
could learn from them? Were they restricted to much smaller list of
platforms?

> I don't expect you or others to buy into any particular code change at
> this point, or to contribute time into it. Just to accept that it's a
> worthwhile goal. If the implementation turns out to be a disaster, then
> it won't be accepted, of course. But I'm optimistic.
> 

I personally am not opposed to the effort in principle, but how do you
even evaluate cost and benefits for a transition like this? I have no
idea how to quantify the costs/benefits for this as a single change.

I've seen some benchmarks in the past, but it's hard to say which of
these improvements are possible only with threads, and what would be
doable with less invasive changes with the process model.

IMHO the only way to move this forward is to divide this into smaller
changes, each of which gives us some benefit we'd want anyway. For
example, this thread already mentioned improving handling of many
connections. AFAICS that requires isolating "session state", which seems
useful even without a full switch to threads as it makes connection
pooling simpler. It should be easier to get a buy-in for these changes,
while introducing abstractions simplifying the switch to threads.



regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Let's make PostgreSQL multi-threaded

2023-06-07 Thread Yura Sokolov

07.06.2023 15:53, Robert Haas wrote:

Right now, if you need a bit
of additional session-local state, you just declare a variable and
you're all set. That's not a perfect system and does cause some
problems, but we can't go from there to a system where it's impossible
to add session-local state without hacking core.



or else it needs to
be design in some kind of extensible way that doesn't require it to
know the full details of every sort of object that's being used as
session-local state anywhere in the system.

And it is quite possible. Although with indirection involved.

For example, we want to add session variable "my_hello_var".
We first need to declare "offset variable".
Then register it in a session.
And then use function and/or macros to get actual address:

/* session.h */
extern size_t RegisterSessionVar(size_t size);
extern void* CurSessionVar(size_t offset);


/* session.c */
typedef struct Session {
char *vars;
} Session;

static _Thread_local Session* curSession;
static size_t sessionVarsSize = 0;
size_t
RegisterSessionVar(size_t size)
{
size_t off = sessionVarsSize;
sessionVarsSize += size;
return off;
}

void*

CurSession(size_t offset)
{
return curSession->vars + offset;
}

/* module_internal.h */

typedef int my_hello_var_t;
extern size_t my_hello_var_offset;

/* access macros */
#define my_hello_var 
(*(my_hello_var_t*)(CurSessionVar(my_hello_var_offset)))

/* module.c */
size_t my_hello_var_offset = 0;

void

PG_init() {
RegisterSessionVar(sizeof(my_hello_var_t), _hello_var_offset);
}

For security reasons, offset could be mangled.

--

regards,
Yura Sokolov





Re: Let's make PostgreSQL multi-threaded

2023-06-07 Thread Ashutosh Bapat
On Mon, Jun 5, 2023 at 8:22 PM Heikki Linnakangas  wrote:
>
> I spoke with some folks at PGCon about making PostgreSQL multi-threaded,
> so that the whole server runs in a single process, with multiple
> threads. It has been discussed many times in the past, last thread on
> pgsql-hackers was back in 2017 when Konstantin made some experiments [0].
>
> I feel that there is now pretty strong consensus that it would be a good
> thing, more so than before. Lots of work to get there, and lots of
> details to be hashed out, but no objections to the idea at a high level.
>
> The purpose of this email is to make that silent consensus explicit. If
> you have objections to switching from the current multi-process
> architecture to a single-process, multi-threaded architecture, please
> speak up.
>
> If there are no major objections, I'm going to update the developer FAQ,
> removing the excuses there for why we don't use threads [1]. And we can
> start to talk about the path to get there. Below is a list of some
> hurdles and proposed high-level solutions. This isn't an exhaustive
> list, just some of the most obvious problems:
>
> # Transition period
>
> The transition surely cannot be done fully in one release. Even if we
> could pull it off in core, extensions will need more time to adapt.
> There will be a transition period of at least one release, probably
> more, where you can choose multi-process or multi-thread model using a
> GUC. Depending on how it goes, we can document it as experimental at first.
>
> # Thread per connection
>
> To get started, it's most straightforward to have one thread per
> connection, just replacing backend process with a backend thread. In the
> future, we might want to have a thread pool with some kind of a
> scheduler to assign active queries to worker threads. Or multiple
> threads per connection, or spawn additional helper threads for specific
> tasks. But that's future work.

With multiple processes, we can use all the available cores (at least
theoretically if all those processes are independent). But is that
guaranteed with single process multi-thread model? Google didn't throw
any definitive answer to that. Usually it depends upon the OS and
architecture.

Maybe a good start is to start using threads instead of parallel
workers e.g. for parallel vacuum, parallel query and so on while
leaving the processes for connections and leaders. that itself might
take significant time. Based on that experience move to a completely
threaded model. Based on my experience with other similar products, I
think we will settle on a multi-process multi-thread model.

-- 
Best Wishes,
Ashutosh Bapat




Re: Let's make PostgreSQL multi-threaded

2023-06-07 Thread Robert Haas
On Tue, Jun 6, 2023 at 10:02 PM Tom Lane  wrote:
> I agree that if we were building this system from scratch today,
> we'd probably choose thread-per-session not process-per-session.
> But the costs of getting to that from where we are will be enormous.
> I seriously doubt that the net benefits could justify that work,
> no matter how long you want to look forward.  It's not really
> significantly different from "let's rewrite the server in
> C++/Rust/$latest_hotness".

Well, I don't know, I think that's a bunch of things that are not all
the same. Rewriting the server in a whole different programming
language would be a massive effort. I can't really see anyone
volunteering to rewrite a million lines of C (or whatever we've got)
in Rust, and I'm not sure who would use the result if they did, or
why. We could, perhaps, allow new source files to be written in Rust
while keeping old ones written in C, but then every hacker has to know
two languages, and having code written in both languages manipulating
the same data structures would probably be a recipe for confusion and
bugs. It's hard to believe that the upsides would be worth the pain.
Maybe transition to C++ would be easier, or maybe it wouldn't, I'm not
sure. But from my point of the view, the issue here is simply that
stop-the-world-and-change-everything is not a viable way forward for a
project the size of PostgreSQL, but incremental changes are
potentially acceptable if the benefits outweigh the drawbacks.

So what are the costs, exactly, of transition to a threaded model? It
seems to me that there's basically one problem: global variables.
Sure, there's a bunch of stuff around process management that would
likely have to be revised in some way, but that's not that much code
and wouldn't have that much impact on unrelated development. However,
the project's widespread and often gratuitous use of global variables
would have to be addressed in some way, and I think that will pretty
much inevitably involve touching all of those global variable
declarations in some way. Now, if we can get away with simply marking
all of those thread-local, then it's of the same general flavor as
PGDLLIMPORT. I am aware that you think that PGDLLIMPORT markings are
ugly as sin, and these would be more widespread since they'd have to
be applied to literally every global variable, including file-local
ones. However, it's hard to imagine that adding such markings would
cause PostgreSQL development to grind to a halt. It would cause minor
rebasing pain and that's about it. I hope that we'd have some tool
that would make the build fail if any markings are missing and
everybody would be annoyed until they finished rebasing all of their
WIP patches and then that would just be how things are. It's not
*lovely* but it doesn't sound that bad either.

In my mind, the bigger question is how much further than that do you
have to go? I think I remember a previous conversation with Andres
where he opined that thread-local variables are "really expensive"
(and I apologize in advance if I'm mis-remembering this). Now, Andres
is not a man who accepts a tax on performance of any size without a
fight, so his "really expensive" might turn out to resemble my "pretty
cheap." However, if widespread use of TLS is too expensive and we have
to start rewriting code to not depend on global variables, that's
going to be more of a problem. If we can get by with doing such
rewrites only in performance-critical places, it might not still be
too bad. Personally, I think the degree of dependence that PostgreSQL
has on global variables is pretty excessive and I don't think that a
certain amount of refactoring to reduce it would be a bad thing. If it
turns into an infinite series of hastily-written patches to rejigger
every source file we have, though, then I'm not really on board with
that.

Heikki mentions the idea of having a central Session object and just
passing that around. I have a hard time believing that's going to work
out nicely. First, it's not extensible. Right now, if you need a bit
of additional session-local state, you just declare a variable and
you're all set. That's not a perfect system and does cause some
problems, but we can't go from there to a system where it's impossible
to add session-local state without hacking core. Second, we will be
sad if session.h ends up #including every other header file that
defines a data structure anywhere in the backend. Or at least I'll be
sad. I'm not actually against the idea of having some kind of session
object that we pass around, but I think it either needs to be limited
to a relatively small set of well-defined things, or else it needs to
be design in some kind of extensible way that doesn't require it to
know the full details of every sort of object that's being used as
session-local state anywhere in the system. I haven't really seen any
convincing design ideas around this yet.

But I think jumping to the conclusion that the migration path 

  1   2   >