Re: [PERFORM] Profiling PostgreSQL

2014-05-25 Thread Dimitris Karampinas
Increasing the shared_buffers size improved the performance by 15%. The
trend remains the same though: steep drop in performance after a certain
number of clients.

My deployment is NUMA-aware. I allocate cores that reside on the same
socket. Once I reach the maximum number of cores, I start allocating cores
from a neighbouring socket.

I'll try to print the number of spins_per_delay for each experiment... just
in case I get something interesting.


On Fri, May 23, 2014 at 7:57 PM, Jeff Janes jeff.ja...@gmail.com wrote:

 On Fri, May 23, 2014 at 10:25 AM, Dimitris Karampinas dkaram...@gmail.com
  wrote:

 I want to bypass any disk bottleneck so I store all the data in ramfs
 (the purpose the project is to profile pg so I don't care for data loss if
 anything goes wrong).
 Since my data are memory resident, I thought the size of the shared
 buffers wouldn't play much role, yet I have to admit that I saw difference
 in performance when modifying shared_buffers parameter.


 In which direction?  If making shared_buffers larger improves things, that
 suggests that you have contention on the BufFreelistLock.  Increasing
 shared_buffers reduces buffer churn (assuming you increase it by enough)
 and so decreases that contention.



 I use taskset to control the number of cores that PostgreSQL is deployed
 on.


 It can be important what bits you set.  For example if you have 4 sockets,
 each one with a quadcore, you would probably maximize the consequences of
 spinlock contention by putting one process on each socket, rather than
 putting them all on the same socket.



 Is there any parameter/variable in the system that is set dynamically and
 depends on the number of cores ?


 The number of spins a spinlock goes through before sleeping,
 spins_per_delay, is determined dynamically based on how often a tight loop
 pays off.  But I don't think this is very sensitive to the exact number
 of processors, just the difference between 1 and more than 1.





Re: [PERFORM] Profiling PostgreSQL

2014-05-25 Thread Matheus de Oliveira
On Sun, May 25, 2014 at 1:26 PM, Dimitris Karampinas dkaram...@gmail.comwrote:

 My deployment is NUMA-aware. I allocate cores that reside on the same
 socket. Once I reach the maximum number of cores, I start allocating cores
 from a neighbouring socket.


I'm not sure if it solves your issue, but on a NUMA environemnt and recent
version of Linux kernel, you should try to disable vm.zone_reclaim_mode, as
it seems to cause performance degradation for database workloads, see [1]
and [2].

[1] http://www.postgresql.org/message-id/500616cb.3070...@2ndquadrant.com
[2]
http://frosty-postgres.blogspot.com.br/2012/08/postgresql-numa-and-zone-reclaim-mode.html

Best regards,
-- 
Matheus de Oliveira
Analista de Banco de Dados
Dextra Sistemas - MPS.Br nĂ­vel F!
www.dextra.com.br/postgres


Re: [PERFORM] Profiling PostgreSQL

2014-05-23 Thread Dimitris Karampinas
Thanks for your answers. A script around pstack worked for me.

(I'm not sure if I should open a new thread, I hope it's OK to ask another
question here)

For the workload I run it seems that PostgreSQL scales with the number of
concurrent clients up to the point that these reach the number of cores
(more or less).
Further increase to the number of clients leads to dramatic performance
degradation. pstack and perf show that backends block on LWLockAcquire
calls, so, someone could assume that the reason the system slows down is
because of multiple concurrent transactions that access the same data.
However I did the two following experiments:
1) I completely removed the UPDATE transactions from my workload. The
throughput turned out to be better yet the trend was the same. Increasing
the number of clients, has a very negative performance impact.
2) I deployed PostgreSQL on more cores. The throughput improved a lot. If
the problem was due to concurrecy control, the throughput should remain the
same - no matter the number of hardware contexts.

Any insight why the system behaves like this ?

Cheers,
Dimitris


On Fri, May 23, 2014 at 1:39 AM, Michael Paquier
michael.paqu...@gmail.comwrote:

 On Thu, May 22, 2014 at 10:48 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Call graph data usually isn't trustworthy unless you built the program
  with -fno-omit-frame-pointer ...
 This page is full of ideas as well:
 https://wiki.postgresql.org/wiki/Profiling_with_perf
 --
 Michael



Re: [PERFORM] Profiling PostgreSQL

2014-05-23 Thread Pavel Stehule
Dne 23.5.2014 16:41 Dimitris Karampinas dkaram...@gmail.com napsal(a):

 Thanks for your answers. A script around pstack worked for me.

 (I'm not sure if I should open a new thread, I hope it's OK to ask
another question here)

 For the workload I run it seems that PostgreSQL scales with the number of
concurrent clients up to the point that these reach the number of cores
(more or less).
 Further increase to the number of clients leads to dramatic performance
degradation. pstack and perf show that backends block on LWLockAcquire
calls, so, someone could assume that the reason the system slows down is
because of multiple concurrent transactions that access the same data.
 However I did the two following experiments:
 1) I completely removed the UPDATE transactions from my workload. The
throughput turned out to be better yet the trend was the same. Increasing
the number of clients, has a very negative performance impact.
 2) I deployed PostgreSQL on more cores. The throughput improved a lot. If
the problem was due to concurrecy control, the throughput should remain the
same - no matter the number of hardware contexts.

 Any insight why the system behaves like this ?

Physical limits, there two possible botleneck: cpu or io. Postgres use one
cpu per session, and if you have cpu intensive benchmark, then max should
be in cpu related workers. Later a workers shares cpu, bu total throughput
should be same to cca 10xCpu (depends on test)


 Cheers,
 Dimitris


 On Fri, May 23, 2014 at 1:39 AM, Michael Paquier 
michael.paqu...@gmail.com wrote:

 On Thu, May 22, 2014 at 10:48 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Call graph data usually isn't trustworthy unless you built the program
  with -fno-omit-frame-pointer ...
 This page is full of ideas as well:
 https://wiki.postgresql.org/wiki/Profiling_with_perf
 --
 Michael




Re: [PERFORM] Profiling PostgreSQL

2014-05-23 Thread Jeff Janes
On Fri, May 23, 2014 at 7:40 AM, Dimitris Karampinas dkaram...@gmail.comwrote:

 Thanks for your answers. A script around pstack worked for me.

 (I'm not sure if I should open a new thread, I hope it's OK to ask another
 question here)

 For the workload I run it seems that PostgreSQL scales with the number of
 concurrent clients up to the point that these reach the number of cores
 (more or less).
 Further increase to the number of clients leads to dramatic performance
 degradation. pstack and perf show that backends block on LWLockAcquire
 calls, so, someone could assume that the reason the system slows down is
 because of multiple concurrent transactions that access the same data.
 However I did the two following experiments:
 1) I completely removed the UPDATE transactions from my workload. The
 throughput turned out to be better yet the trend was the same. Increasing
 the number of clients, has a very negative performance impact.


Currently acquisition and release of all LWLock, even in shared mode, are
protected by spinlocks, which are exclusive. So they cause a lot of
contention even on read-only workloads.  Also if the working set fits in
RAM but not in shared_buffers, you will have a lot of exclusive locks on
the buffer freelist and the buffer mapping tables.



 2) I deployed PostgreSQL on more cores. The throughput improved a lot. If
 the problem was due to concurrecy control, the throughput should remain the
 same - no matter the number of hardware contexts.


Hardware matters!  How did you change the number of cores?

Cheers,

Jeff


Re: [PERFORM] Profiling PostgreSQL

2014-05-23 Thread Dimitris Karampinas
I want to bypass any disk bottleneck so I store all the data in ramfs (the
purpose the project is to profile pg so I don't care for data loss if
anything goes wrong).
Since my data are memory resident, I thought the size of the shared buffers
wouldn't play much role, yet I have to admit that I saw difference in
performance when modifying shared_buffers parameter.

I use taskset to control the number of cores that PostgreSQL is deployed on.

Is there any parameter/variable in the system that is set dynamically and
depends on the number of cores ?

Cheers,
Dimitris


On Fri, May 23, 2014 at 6:52 PM, Jeff Janes jeff.ja...@gmail.com wrote:

 On Fri, May 23, 2014 at 7:40 AM, Dimitris Karampinas 
 dkaram...@gmail.comwrote:

 Thanks for your answers. A script around pstack worked for me.

 (I'm not sure if I should open a new thread, I hope it's OK to ask
 another question here)

 For the workload I run it seems that PostgreSQL scales with the number of
 concurrent clients up to the point that these reach the number of cores
 (more or less).
 Further increase to the number of clients leads to dramatic performance
 degradation. pstack and perf show that backends block on LWLockAcquire
 calls, so, someone could assume that the reason the system slows down is
 because of multiple concurrent transactions that access the same data.
 However I did the two following experiments:
 1) I completely removed the UPDATE transactions from my workload. The
 throughput turned out to be better yet the trend was the same. Increasing
 the number of clients, has a very negative performance impact.


 Currently acquisition and release of all LWLock, even in shared mode, are
 protected by spinlocks, which are exclusive. So they cause a lot of
 contention even on read-only workloads.  Also if the working set fits in
 RAM but not in shared_buffers, you will have a lot of exclusive locks on
 the buffer freelist and the buffer mapping tables.



 2) I deployed PostgreSQL on more cores. The throughput improved a lot. If
 the problem was due to concurrecy control, the throughput should remain the
 same - no matter the number of hardware contexts.


 Hardware matters!  How did you change the number of cores?

 Cheers,

 Jeff



Re: [PERFORM] Profiling PostgreSQL

2014-05-23 Thread Jeff Janes
On Fri, May 23, 2014 at 10:25 AM, Dimitris Karampinas
dkaram...@gmail.comwrote:

 I want to bypass any disk bottleneck so I store all the data in ramfs (the
 purpose the project is to profile pg so I don't care for data loss if
 anything goes wrong).
 Since my data are memory resident, I thought the size of the shared
 buffers wouldn't play much role, yet I have to admit that I saw difference
 in performance when modifying shared_buffers parameter.


In which direction?  If making shared_buffers larger improves things, that
suggests that you have contention on the BufFreelistLock.  Increasing
shared_buffers reduces buffer churn (assuming you increase it by enough)
and so decreases that contention.



 I use taskset to control the number of cores that PostgreSQL is deployed
 on.


It can be important what bits you set.  For example if you have 4 sockets,
each one with a quadcore, you would probably maximize the consequences of
spinlock contention by putting one process on each socket, rather than
putting them all on the same socket.



 Is there any parameter/variable in the system that is set dynamically and
 depends on the number of cores ?


The number of spins a spinlock goes through before sleeping,
spins_per_delay, is determined dynamically based on how often a tight loop
pays off.  But I don't think this is very sensitive to the exact number
of processors, just the difference between 1 and more than 1.


[PERFORM] Profiling PostgreSQL

2014-05-22 Thread Dimitris Karampinas
Is there any way to get the call stack of a function when profiling
PostgreSQL with perf ?
I configured with --enable-debug, I run a benchmark against the system and
I'm able to identify a bottleneck.
40% of the time is spent on an spinlock yet I cannot find out the codepath
that gets me there.
Using --call-graph with perf record didn't seem to help.

Any ideas ?

Cheers,
Dimitris


Re: [PERFORM] Profiling PostgreSQL

2014-05-22 Thread David Boreham

On 5/22/2014 7:27 AM, Dimitris Karampinas wrote:
Is there any way to get the call stack of a function when profiling 
PostgreSQL with perf ?
I configured with --enable-debug, I run a benchmark against the system 
and I'm able to identify a bottleneck.
40% of the time is spent on an spinlock yet I cannot find out the 
codepath that gets me there.

Using --call-graph with perf record didn't seem to help.

Any ideas ?

Can you arrange to run 'pstack' a few times on the target process 
(either manually or with a shell script)?
If the probability of the process being in the spinning state is high, 
then this approach should snag you at least one call stack.





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Profiling PostgreSQL

2014-05-22 Thread Tom Lane
Dimitris Karampinas dkaram...@gmail.com writes:
 Is there any way to get the call stack of a function when profiling
 PostgreSQL with perf ?
 I configured with --enable-debug, I run a benchmark against the system and
 I'm able to identify a bottleneck.
 40% of the time is spent on an spinlock yet I cannot find out the codepath
 that gets me there.
 Using --call-graph with perf record didn't seem to help.

Call graph data usually isn't trustworthy unless you built the program
with -fno-omit-frame-pointer ...

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Profiling PostgreSQL

2014-05-22 Thread Michael Paquier
On Thu, May 22, 2014 at 10:48 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Call graph data usually isn't trustworthy unless you built the program
 with -fno-omit-frame-pointer ...
This page is full of ideas as well:
https://wiki.postgresql.org/wiki/Profiling_with_perf
-- 
Michael


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] profiling postgresql queries?

2005-04-12 Thread hubert lubaczewski
hi
i'm not totally sure i should ask on this mailing list - so if you think
i should better ask someplace else, please let me know.

the problem i have is that specific queries (inserts and updates) take a
long time to run.

of course i do vacuum analyze frequently. i also use explain analyze on
queries.

the problem is that both the inserts and updated operate on
heavy-tirggered tables.
and it made me wonder - is there a way to tell how much time of backend
was spent on triggers, index updates and so on?
like:
total query time: 1 secons
trigger a: 0.50 second
trigger b: 0.25 second
index update: 0.1 second

something like this.

is it possible?
will it be ever possible?

hubert

-- 
hubert lubaczewski
Network Operations Center
eo Networks Sp. z o.o.

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] profiling postgresql queries?

2005-04-12 Thread Tom Lane
hubert lubaczewski [EMAIL PROTECTED] writes:
 and it made me wonder - is there a way to tell how much time of backend
 was spent on triggers, index updates and so on?

In CVS tip, EXPLAIN ANALYZE will break out the time spent in each
trigger.  This is not in any released version, but if you're desperate
you could load up a play server with your data and test.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] profiling postgresql queries?

2005-04-12 Thread hubert lubaczewski
On Tue, Apr 12, 2005 at 10:18:31AM -0400, Alex Turner wrote:
 Speaking of triggers...
 Is there any plan to speed up plpgsql tiggers?  Fairly simple
 crosstable insert triggers seem to slow my inserts to a crawl.

plpgsql is quite fast actually. if some triggers slow inserts too much,
i guess you should be able to spped them up with some performance review
of trigger code.

depesz

-- 
hubert lubaczewski
Network Operations Center
eo Networks Sp. z o.o.


signature.asc
Description: Digital signature


Re: [PERFORM] profiling postgresql queries?

2005-04-12 Thread Michael Fuhr
On Tue, Apr 12, 2005 at 12:46:43PM +0200, hubert lubaczewski wrote:
 
 the problem is that both the inserts and updated operate on
 heavy-tirggered tables.
 and it made me wonder - is there a way to tell how much time of backend
 was spent on triggers, index updates and so on?
 like:
 total query time: 1 secons
 trigger a: 0.50 second
 trigger b: 0.25 second
 index update: 0.1 second

EXPLAIN ANALYZE in 8.1devel (CVS HEAD) prints a few statistics for
triggers:

EXPLAIN ANALYZE UPDATE foo SET x = 10 WHERE x = 20;
QUERY PLAN  
  
--
 Index Scan using foo_x_idx on foo  (cost=0.00..14.44 rows=10 width=22) (actual 
time=0.184..0.551 rows=7 loops=1)
   Index Cond: (x = 20)
 Trigger row_trig1: time=1.625 calls=7
 Trigger row_trig2: time=1.346 calls=7
 Trigger stmt_trig1: time=1.436 calls=1
 Total runtime: 9.659 ms
(6 rows)

8.1devel changes frequently (sometimes requiring initdb) and isn't
suitable for production, but if the trigger statistics would be
helpful then you could set up a test server and load a copy of your
database into it.  Just beware that because it's bleeding edge, it
might destroy your data and it might behave differently than released
versions.

-- 
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] profiling postgresql queries?

2005-04-12 Thread hubert lubaczewski
On Tue, Apr 12, 2005 at 08:43:59AM -0600, Michael Fuhr wrote:
 8.1devel changes frequently (sometimes requiring initdb) and isn't
 suitable for production, but if the trigger statistics would be
 helpful then you could set up a test server and load a copy of your
 database into it.  Just beware that because it's bleeding edge, it
 might destroy your data and it might behave differently than released
 versions.

great. this is exactly what i need. thanks for hint.

depesz

-- 
hubert lubaczewski
Network Operations Center
eo Networks Sp. z o.o.


signature.asc
Description: Digital signature