I want to bypass any disk bottleneck so I store all the data in ramfs (the purpose the project is to profile pg so I don't care for data loss if anything goes wrong). Since my data are memory resident, I thought the size of the shared buffers wouldn't play much role, yet I have to admit that I saw difference in performance when modifying shared_buffers parameter.
I use taskset to control the number of cores that PostgreSQL is deployed on. Is there any parameter/variable in the system that is set dynamically and depends on the number of cores ? Cheers, Dimitris On Fri, May 23, 2014 at 6:52 PM, Jeff Janes <jeff.ja...@gmail.com> wrote: > On Fri, May 23, 2014 at 7:40 AM, Dimitris Karampinas > <dkaram...@gmail.com>wrote: > >> Thanks for your answers. A script around pstack worked for me. >> >> (I'm not sure if I should open a new thread, I hope it's OK to ask >> another question here) >> >> For the workload I run it seems that PostgreSQL scales with the number of >> concurrent clients up to the point that these reach the number of cores >> (more or less). >> Further increase to the number of clients leads to dramatic performance >> degradation. pstack and perf show that backends block on LWLockAcquire >> calls, so, someone could assume that the reason the system slows down is >> because of multiple concurrent transactions that access the same data. >> However I did the two following experiments: >> 1) I completely removed the UPDATE transactions from my workload. The >> throughput turned out to be better yet the trend was the same. Increasing >> the number of clients, has a very negative performance impact. >> > > Currently acquisition and release of all LWLock, even in shared mode, are > protected by spinlocks, which are exclusive. So they cause a lot of > contention even on read-only workloads. Also if the working set fits in > RAM but not in shared_buffers, you will have a lot of exclusive locks on > the buffer freelist and the buffer mapping tables. > > > >> 2) I deployed PostgreSQL on more cores. The throughput improved a lot. If >> the problem was due to concurrecy control, the throughput should remain the >> same - no matter the number of hardware contexts. >> > > Hardware matters! How did you change the number of cores? > > Cheers, > > Jeff >