Re: [PERFORM] Profiling PostgreSQL
Increasing the shared_buffers size improved the performance by 15%. The trend remains the same though: steep drop in performance after a certain number of clients. My deployment is NUMA-aware. I allocate cores that reside on the same socket. Once I reach the maximum number of cores, I start allocating cores from a neighbouring socket. I'll try to print the number of spins_per_delay for each experiment... just in case I get something interesting. On Fri, May 23, 2014 at 7:57 PM, Jeff Janes jeff.ja...@gmail.com wrote: On Fri, May 23, 2014 at 10:25 AM, Dimitris Karampinas dkaram...@gmail.com wrote: I want to bypass any disk bottleneck so I store all the data in ramfs (the purpose the project is to profile pg so I don't care for data loss if anything goes wrong). Since my data are memory resident, I thought the size of the shared buffers wouldn't play much role, yet I have to admit that I saw difference in performance when modifying shared_buffers parameter. In which direction? If making shared_buffers larger improves things, that suggests that you have contention on the BufFreelistLock. Increasing shared_buffers reduces buffer churn (assuming you increase it by enough) and so decreases that contention. I use taskset to control the number of cores that PostgreSQL is deployed on. It can be important what bits you set. For example if you have 4 sockets, each one with a quadcore, you would probably maximize the consequences of spinlock contention by putting one process on each socket, rather than putting them all on the same socket. Is there any parameter/variable in the system that is set dynamically and depends on the number of cores ? The number of spins a spinlock goes through before sleeping, spins_per_delay, is determined dynamically based on how often a tight loop pays off. But I don't think this is very sensitive to the exact number of processors, just the difference between 1 and more than 1.
Re: [PERFORM] Profiling PostgreSQL
On Sun, May 25, 2014 at 1:26 PM, Dimitris Karampinas dkaram...@gmail.comwrote: My deployment is NUMA-aware. I allocate cores that reside on the same socket. Once I reach the maximum number of cores, I start allocating cores from a neighbouring socket. I'm not sure if it solves your issue, but on a NUMA environemnt and recent version of Linux kernel, you should try to disable vm.zone_reclaim_mode, as it seems to cause performance degradation for database workloads, see [1] and [2]. [1] http://www.postgresql.org/message-id/500616cb.3070...@2ndquadrant.com [2] http://frosty-postgres.blogspot.com.br/2012/08/postgresql-numa-and-zone-reclaim-mode.html Best regards, -- Matheus de Oliveira Analista de Banco de Dados Dextra Sistemas - MPS.Br nĂvel F! www.dextra.com.br/postgres
Re: [PERFORM] Profiling PostgreSQL
Thanks for your answers. A script around pstack worked for me. (I'm not sure if I should open a new thread, I hope it's OK to ask another question here) For the workload I run it seems that PostgreSQL scales with the number of concurrent clients up to the point that these reach the number of cores (more or less). Further increase to the number of clients leads to dramatic performance degradation. pstack and perf show that backends block on LWLockAcquire calls, so, someone could assume that the reason the system slows down is because of multiple concurrent transactions that access the same data. However I did the two following experiments: 1) I completely removed the UPDATE transactions from my workload. The throughput turned out to be better yet the trend was the same. Increasing the number of clients, has a very negative performance impact. 2) I deployed PostgreSQL on more cores. The throughput improved a lot. If the problem was due to concurrecy control, the throughput should remain the same - no matter the number of hardware contexts. Any insight why the system behaves like this ? Cheers, Dimitris On Fri, May 23, 2014 at 1:39 AM, Michael Paquier michael.paqu...@gmail.comwrote: On Thu, May 22, 2014 at 10:48 PM, Tom Lane t...@sss.pgh.pa.us wrote: Call graph data usually isn't trustworthy unless you built the program with -fno-omit-frame-pointer ... This page is full of ideas as well: https://wiki.postgresql.org/wiki/Profiling_with_perf -- Michael
Re: [PERFORM] Profiling PostgreSQL
Dne 23.5.2014 16:41 Dimitris Karampinas dkaram...@gmail.com napsal(a): Thanks for your answers. A script around pstack worked for me. (I'm not sure if I should open a new thread, I hope it's OK to ask another question here) For the workload I run it seems that PostgreSQL scales with the number of concurrent clients up to the point that these reach the number of cores (more or less). Further increase to the number of clients leads to dramatic performance degradation. pstack and perf show that backends block on LWLockAcquire calls, so, someone could assume that the reason the system slows down is because of multiple concurrent transactions that access the same data. However I did the two following experiments: 1) I completely removed the UPDATE transactions from my workload. The throughput turned out to be better yet the trend was the same. Increasing the number of clients, has a very negative performance impact. 2) I deployed PostgreSQL on more cores. The throughput improved a lot. If the problem was due to concurrecy control, the throughput should remain the same - no matter the number of hardware contexts. Any insight why the system behaves like this ? Physical limits, there two possible botleneck: cpu or io. Postgres use one cpu per session, and if you have cpu intensive benchmark, then max should be in cpu related workers. Later a workers shares cpu, bu total throughput should be same to cca 10xCpu (depends on test) Cheers, Dimitris On Fri, May 23, 2014 at 1:39 AM, Michael Paquier michael.paqu...@gmail.com wrote: On Thu, May 22, 2014 at 10:48 PM, Tom Lane t...@sss.pgh.pa.us wrote: Call graph data usually isn't trustworthy unless you built the program with -fno-omit-frame-pointer ... This page is full of ideas as well: https://wiki.postgresql.org/wiki/Profiling_with_perf -- Michael
Re: [PERFORM] Profiling PostgreSQL
On Fri, May 23, 2014 at 7:40 AM, Dimitris Karampinas dkaram...@gmail.comwrote: Thanks for your answers. A script around pstack worked for me. (I'm not sure if I should open a new thread, I hope it's OK to ask another question here) For the workload I run it seems that PostgreSQL scales with the number of concurrent clients up to the point that these reach the number of cores (more or less). Further increase to the number of clients leads to dramatic performance degradation. pstack and perf show that backends block on LWLockAcquire calls, so, someone could assume that the reason the system slows down is because of multiple concurrent transactions that access the same data. However I did the two following experiments: 1) I completely removed the UPDATE transactions from my workload. The throughput turned out to be better yet the trend was the same. Increasing the number of clients, has a very negative performance impact. Currently acquisition and release of all LWLock, even in shared mode, are protected by spinlocks, which are exclusive. So they cause a lot of contention even on read-only workloads. Also if the working set fits in RAM but not in shared_buffers, you will have a lot of exclusive locks on the buffer freelist and the buffer mapping tables. 2) I deployed PostgreSQL on more cores. The throughput improved a lot. If the problem was due to concurrecy control, the throughput should remain the same - no matter the number of hardware contexts. Hardware matters! How did you change the number of cores? Cheers, Jeff
Re: [PERFORM] Profiling PostgreSQL
I want to bypass any disk bottleneck so I store all the data in ramfs (the purpose the project is to profile pg so I don't care for data loss if anything goes wrong). Since my data are memory resident, I thought the size of the shared buffers wouldn't play much role, yet I have to admit that I saw difference in performance when modifying shared_buffers parameter. I use taskset to control the number of cores that PostgreSQL is deployed on. Is there any parameter/variable in the system that is set dynamically and depends on the number of cores ? Cheers, Dimitris On Fri, May 23, 2014 at 6:52 PM, Jeff Janes jeff.ja...@gmail.com wrote: On Fri, May 23, 2014 at 7:40 AM, Dimitris Karampinas dkaram...@gmail.comwrote: Thanks for your answers. A script around pstack worked for me. (I'm not sure if I should open a new thread, I hope it's OK to ask another question here) For the workload I run it seems that PostgreSQL scales with the number of concurrent clients up to the point that these reach the number of cores (more or less). Further increase to the number of clients leads to dramatic performance degradation. pstack and perf show that backends block on LWLockAcquire calls, so, someone could assume that the reason the system slows down is because of multiple concurrent transactions that access the same data. However I did the two following experiments: 1) I completely removed the UPDATE transactions from my workload. The throughput turned out to be better yet the trend was the same. Increasing the number of clients, has a very negative performance impact. Currently acquisition and release of all LWLock, even in shared mode, are protected by spinlocks, which are exclusive. So they cause a lot of contention even on read-only workloads. Also if the working set fits in RAM but not in shared_buffers, you will have a lot of exclusive locks on the buffer freelist and the buffer mapping tables. 2) I deployed PostgreSQL on more cores. The throughput improved a lot. If the problem was due to concurrecy control, the throughput should remain the same - no matter the number of hardware contexts. Hardware matters! How did you change the number of cores? Cheers, Jeff
Re: [PERFORM] Profiling PostgreSQL
On Fri, May 23, 2014 at 10:25 AM, Dimitris Karampinas dkaram...@gmail.comwrote: I want to bypass any disk bottleneck so I store all the data in ramfs (the purpose the project is to profile pg so I don't care for data loss if anything goes wrong). Since my data are memory resident, I thought the size of the shared buffers wouldn't play much role, yet I have to admit that I saw difference in performance when modifying shared_buffers parameter. In which direction? If making shared_buffers larger improves things, that suggests that you have contention on the BufFreelistLock. Increasing shared_buffers reduces buffer churn (assuming you increase it by enough) and so decreases that contention. I use taskset to control the number of cores that PostgreSQL is deployed on. It can be important what bits you set. For example if you have 4 sockets, each one with a quadcore, you would probably maximize the consequences of spinlock contention by putting one process on each socket, rather than putting them all on the same socket. Is there any parameter/variable in the system that is set dynamically and depends on the number of cores ? The number of spins a spinlock goes through before sleeping, spins_per_delay, is determined dynamically based on how often a tight loop pays off. But I don't think this is very sensitive to the exact number of processors, just the difference between 1 and more than 1.
[PERFORM] Profiling PostgreSQL
Is there any way to get the call stack of a function when profiling PostgreSQL with perf ? I configured with --enable-debug, I run a benchmark against the system and I'm able to identify a bottleneck. 40% of the time is spent on an spinlock yet I cannot find out the codepath that gets me there. Using --call-graph with perf record didn't seem to help. Any ideas ? Cheers, Dimitris
Re: [PERFORM] Profiling PostgreSQL
On 5/22/2014 7:27 AM, Dimitris Karampinas wrote: Is there any way to get the call stack of a function when profiling PostgreSQL with perf ? I configured with --enable-debug, I run a benchmark against the system and I'm able to identify a bottleneck. 40% of the time is spent on an spinlock yet I cannot find out the codepath that gets me there. Using --call-graph with perf record didn't seem to help. Any ideas ? Can you arrange to run 'pstack' a few times on the target process (either manually or with a shell script)? If the probability of the process being in the spinning state is high, then this approach should snag you at least one call stack. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Profiling PostgreSQL
Dimitris Karampinas dkaram...@gmail.com writes: Is there any way to get the call stack of a function when profiling PostgreSQL with perf ? I configured with --enable-debug, I run a benchmark against the system and I'm able to identify a bottleneck. 40% of the time is spent on an spinlock yet I cannot find out the codepath that gets me there. Using --call-graph with perf record didn't seem to help. Call graph data usually isn't trustworthy unless you built the program with -fno-omit-frame-pointer ... regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Profiling PostgreSQL
On Thu, May 22, 2014 at 10:48 PM, Tom Lane t...@sss.pgh.pa.us wrote: Call graph data usually isn't trustworthy unless you built the program with -fno-omit-frame-pointer ... This page is full of ideas as well: https://wiki.postgresql.org/wiki/Profiling_with_perf -- Michael -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] profiling postgresql queries?
hi i'm not totally sure i should ask on this mailing list - so if you think i should better ask someplace else, please let me know. the problem i have is that specific queries (inserts and updates) take a long time to run. of course i do vacuum analyze frequently. i also use explain analyze on queries. the problem is that both the inserts and updated operate on heavy-tirggered tables. and it made me wonder - is there a way to tell how much time of backend was spent on triggers, index updates and so on? like: total query time: 1 secons trigger a: 0.50 second trigger b: 0.25 second index update: 0.1 second something like this. is it possible? will it be ever possible? hubert -- hubert lubaczewski Network Operations Center eo Networks Sp. z o.o. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] profiling postgresql queries?
hubert lubaczewski [EMAIL PROTECTED] writes: and it made me wonder - is there a way to tell how much time of backend was spent on triggers, index updates and so on? In CVS tip, EXPLAIN ANALYZE will break out the time spent in each trigger. This is not in any released version, but if you're desperate you could load up a play server with your data and test. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] profiling postgresql queries?
On Tue, Apr 12, 2005 at 10:18:31AM -0400, Alex Turner wrote: Speaking of triggers... Is there any plan to speed up plpgsql tiggers? Fairly simple crosstable insert triggers seem to slow my inserts to a crawl. plpgsql is quite fast actually. if some triggers slow inserts too much, i guess you should be able to spped them up with some performance review of trigger code. depesz -- hubert lubaczewski Network Operations Center eo Networks Sp. z o.o. signature.asc Description: Digital signature
Re: [PERFORM] profiling postgresql queries?
On Tue, Apr 12, 2005 at 12:46:43PM +0200, hubert lubaczewski wrote: the problem is that both the inserts and updated operate on heavy-tirggered tables. and it made me wonder - is there a way to tell how much time of backend was spent on triggers, index updates and so on? like: total query time: 1 secons trigger a: 0.50 second trigger b: 0.25 second index update: 0.1 second EXPLAIN ANALYZE in 8.1devel (CVS HEAD) prints a few statistics for triggers: EXPLAIN ANALYZE UPDATE foo SET x = 10 WHERE x = 20; QUERY PLAN -- Index Scan using foo_x_idx on foo (cost=0.00..14.44 rows=10 width=22) (actual time=0.184..0.551 rows=7 loops=1) Index Cond: (x = 20) Trigger row_trig1: time=1.625 calls=7 Trigger row_trig2: time=1.346 calls=7 Trigger stmt_trig1: time=1.436 calls=1 Total runtime: 9.659 ms (6 rows) 8.1devel changes frequently (sometimes requiring initdb) and isn't suitable for production, but if the trigger statistics would be helpful then you could set up a test server and load a copy of your database into it. Just beware that because it's bleeding edge, it might destroy your data and it might behave differently than released versions. -- Michael Fuhr http://www.fuhr.org/~mfuhr/ ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] profiling postgresql queries?
On Tue, Apr 12, 2005 at 08:43:59AM -0600, Michael Fuhr wrote: 8.1devel changes frequently (sometimes requiring initdb) and isn't suitable for production, but if the trigger statistics would be helpful then you could set up a test server and load a copy of your database into it. Just beware that because it's bleeding edge, it might destroy your data and it might behave differently than released versions. great. this is exactly what i need. thanks for hint. depesz -- hubert lubaczewski Network Operations Center eo Networks Sp. z o.o. signature.asc Description: Digital signature