Increasing the shared_buffers size improved the performance by 15%. The trend remains the same though: steep drop in performance after a certain number of clients.
My deployment is "NUMA-aware". I allocate cores that reside on the same socket. Once I reach the maximum number of cores, I start allocating cores from a neighbouring socket. I'll try to print the number of spins_per_delay for each experiment... just in case I get something interesting. On Fri, May 23, 2014 at 7:57 PM, Jeff Janes <jeff.ja...@gmail.com> wrote: > On Fri, May 23, 2014 at 10:25 AM, Dimitris Karampinas <dkaram...@gmail.com > > wrote: > >> I want to bypass any disk bottleneck so I store all the data in ramfs >> (the purpose the project is to profile pg so I don't care for data loss if >> anything goes wrong). >> Since my data are memory resident, I thought the size of the shared >> buffers wouldn't play much role, yet I have to admit that I saw difference >> in performance when modifying shared_buffers parameter. >> > > In which direction? If making shared_buffers larger improves things, that > suggests that you have contention on the BufFreelistLock. Increasing > shared_buffers reduces buffer churn (assuming you increase it by enough) > and so decreases that contention. > > >> >> I use taskset to control the number of cores that PostgreSQL is deployed >> on. >> > > It can be important what bits you set. For example if you have 4 sockets, > each one with a quadcore, you would probably maximize the consequences of > spinlock contention by putting one process on each socket, rather than > putting them all on the same socket. > > >> >> Is there any parameter/variable in the system that is set dynamically and >> depends on the number of cores ? >> > > The number of spins a spinlock goes through before sleeping, > spins_per_delay, is determined dynamically based on how often a tight loop > "pays off". But I don't think this is very sensitive to the exact number > of processors, just the difference between 1 and more than 1. > > >