[PERFORM] Linux/PostgreSQL scalability issue - problem with 8 cores

2008-01-03 Thread Jakub Ouhrabka

Hi all,

we have a PostgreSQL dedicated Linux server with 8 cores (2xX5355). We 
came accross a strange issue: when running with all 8 cores enabled 
approximatly once a minute (period differs) the system is very busy for 
a few seconds (~5-10s) and we don't know why - this issue don't show up 
when we tell Linux to use only 2 cores, with 4 cores the problem is here 
but it is still better than with 8 cores - all on the same machine, same 
config, same workload. We don't see any apparent reason for these peaks. 
We'd like to investigate it further but we don't know what to try next. 
Any suggenstions? Any tunning tips for Linux+PostgreSQL on 8-way system? 
Can this be connected with our heavy use of listen/notify and hundreds 
backends in listen mode?


More details are below.

Thanks,

Kuba

System: HP DL360 2x5355, 8G RAM, P600+MSA50 - internal 2x72GB RAID 10 
for OS, 10x72G disks RAID 10 for PostgreSQL data and wal

OS: Linux 2.6 64bit (kernel 2.6.21, 22, 23 makes little difference)
PostgreSQL: 8.2.4 (64bit), shared buffers 1G

Nothing else than PostgreSQL is running on the server. Cca 800 
concurrent backends. Majority of backends in LISTEN doing nothing. 
Client interface for most backends is ecpg+libpq.


Problem description:

The system is usually running 80-95% idle. Approximatly once a minute 
for cca 5-10s there is a peak in activity which looks like this:


vmstat (and top or atop) reports 0% idle, 100% in user mode, very low 
iowait, low IO activity, higher number of contex switches than usual but 
not exceedingly high (2000-4000cs/s, usually 1500cs/s), few hundreds 
waiting processes per second (usually 0-1/s). From looking at top and 
running processes we can't see any obvious reason for the peak. 
According to PostgreSQL log the long running commands from these moments 
are e.g. begin transaction lasting several seconds.


When only 2 cores are enabled (kernel command line) then everything is 
running smoothly. 4 cores exibits slightly better behavior than 8 cores 
but worse than 2 cores - the peaks are visible.


We've tried kernel versions 2.6.21-23 (latest revisions as of beginning 
December from kernel.org) the pattern slightly changed but it may also 
be that the workload slightly changed.


pgbench or any other stress testing runs smoothly on the server.

The only strange thing about our usage pattern I can think of is heavy 
use of LISTEN/NOTIFY especially hunderds backends in listen mode.


When restarting our connected clients the peaks are not there from time 
0, they are visible after a while - seems something gets synchronized 
and causing troubles then.


Since the server is PostgreSQL dedicated and no our client applications 
are running on it - and there is a difference when 2 and 8 cores are 
enabled -  we think that the peaks are not caused by our client 
applications.


How can we diagnose what is happening during the peaks?

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[PERFORM] Commit takes a long time.

2008-01-03 Thread Peter Childs
Using Postgresql 8.1.10 every so often I get a transaction that takes a
while to commit.

I log everything that takes over 500ms and quite reguallly it says things
like

707.036 ms statement: COMMIT

Is there anyway to speed this up?

Peter Childs


Re: [PERFORM] Commit takes a long time.

2008-01-03 Thread Pavel Stehule
Hello

On 03/01/2008, Peter Childs <[EMAIL PROTECTED]> wrote:
> Using Postgresql 8.1.10 every so often I get a transaction that takes a
> while to commit.
>
> I log everything that takes over 500ms and quite reguallly it says things
> like
>
> 707.036 ms statement: COMMIT
>
> Is there anyway to speed this up?
>

there can be two issues:
a) some trigger activity for  DEFERRED constraints
b) slow write to WAL

http://www.westnet.com/~gsmith/content/postgresql/

in normal cases COMMIT is really fast operation.

Regards
Pavel Stehule

> Peter Childs
>
>
>
>

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Linux/PostgreSQL scalability issue - problem with 8 cores

2008-01-03 Thread Sven Geisler
Hi Jakub,

I do have a similar server (from DELL), which performance well with our
PostgreSQL application. I guess the peak in context switches is the only
think you can see.

Anyhow, I think it is you're LISTEN/NOTIFY approach which cause that
behaviour. I guess all backends do listen to the same notification.
I don't know the exact implementation, but I can imagine that all
backends are access the same section in the shared memory which cause
the increase of context switches. More cores means more access at the
same time.

Can you change your implementation?
- split you problem - create multiple notification if possible
- do an UNLISTEN if possible
- use another signalisation technique

Regards
Sven


Jakub Ouhrabka schrieb:
> Hi all,
> 
> we have a PostgreSQL dedicated Linux server with 8 cores (2xX5355). We
> came accross a strange issue: when running with all 8 cores enabled
> approximatly once a minute (period differs) the system is very busy for
> a few seconds (~5-10s) and we don't know why - this issue don't show up
> when we tell Linux to use only 2 cores, with 4 cores the problem is here
> but it is still better than with 8 cores - all on the same machine, same
> config, same workload. We don't see any apparent reason for these peaks.
> We'd like to investigate it further but we don't know what to try next.
> Any suggenstions? Any tunning tips for Linux+PostgreSQL on 8-way system?
> Can this be connected with our heavy use of listen/notify and hundreds
> backends in listen mode?
> 
> More details are below.
> 
> Thanks,
> 
> Kuba
> 
> System: HP DL360 2x5355, 8G RAM, P600+MSA50 - internal 2x72GB RAID 10
> for OS, 10x72G disks RAID 10 for PostgreSQL data and wal
> OS: Linux 2.6 64bit (kernel 2.6.21, 22, 23 makes little difference)
> PostgreSQL: 8.2.4 (64bit), shared buffers 1G
> 
> Nothing else than PostgreSQL is running on the server. Cca 800
> concurrent backends. Majority of backends in LISTEN doing nothing.
> Client interface for most backends is ecpg+libpq.
> 
> Problem description:
> 
> The system is usually running 80-95% idle. Approximatly once a minute
> for cca 5-10s there is a peak in activity which looks like this:
> 
> vmstat (and top or atop) reports 0% idle, 100% in user mode, very low
> iowait, low IO activity, higher number of contex switches than usual but
> not exceedingly high (2000-4000cs/s, usually 1500cs/s), few hundreds
> waiting processes per second (usually 0-1/s). From looking at top and
> running processes we can't see any obvious reason for the peak.
> According to PostgreSQL log the long running commands from these moments
> are e.g. begin transaction lasting several seconds.
> 
> When only 2 cores are enabled (kernel command line) then everything is
> running smoothly. 4 cores exibits slightly better behavior than 8 cores
> but worse than 2 cores - the peaks are visible.
> 
> We've tried kernel versions 2.6.21-23 (latest revisions as of beginning
> December from kernel.org) the pattern slightly changed but it may also
> be that the workload slightly changed.
> 
> pgbench or any other stress testing runs smoothly on the server.
> 
> The o usage panly strange thing about ourttern I can think of is heavy
> use of LISTEN/NOTIFY especially hunderds backends in listen mode.
> 
> When restarting our connected clients the peaks are not there from time
> 0, they are visible after a while - seems something gets synchronized
> and causing troubles then.
> 
> Since the server is PostgreSQL dedicated and no our client applications
> are running on it - and there is a difference when 2 and 8 cores are
> enabled -  we think that the peaks are not caused by our client
> applications.
> 
> How can we diagnose what is happening during the peaks?
> 
> ---(end of broadcast)---
> TIP 1: if posting/reading through Usenet, please send an appropriate
>   subscribe-nomail command to [EMAIL PROTECTED] so that your
>   message can get through to the mailing list cleanly

-- 
Sven Geisler <[EMAIL PROTECTED]>   Tel +49.30.921017.81  Fax .50
Senior Developer, AEC/communications GmbH & Co. KG Berlin, Germany

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Commit takes a long time.

2008-01-03 Thread Tom Lane
"Peter Childs" <[EMAIL PROTECTED]> writes:
> Using Postgresql 8.1.10 every so often I get a transaction that takes a
> while to commit.

> I log everything that takes over 500ms and quite reguallly it says things
> like

> 707.036 ms statement: COMMIT

AFAIK there are only two likely explanations for that:

1. You have a lot of deferred triggers that have to run at COMMIT time.

2. The disk system gets so bottlenecked that fsync'ing the commit record
takes a long time.

If it's #2 you could probably correlate the problem with spikes in I/O
activity as seen in iostat or vmstat.

If it is a disk usage spike then I would make the further guess that
what causes it might be a Postgres checkpoint.  You might be able to
dampen the spike a bit by playing with the checkpoint parameters, but
the only real fix will be 8.3's spread-out-checkpoints feature.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] Linux/PostgreSQL scalability issue - problem with 8 cores

2008-01-03 Thread Tom Lane
Jakub Ouhrabka <[EMAIL PROTECTED]> writes:
> we have a PostgreSQL dedicated Linux server with 8 cores (2xX5355). We 
> came accross a strange issue: when running with all 8 cores enabled 
> approximatly once a minute (period differs) the system is very busy for 
> a few seconds (~5-10s) and we don't know why - this issue don't show up 
> when we tell Linux to use only 2 cores, with 4 cores the problem is here 
> but it is still better than with 8 cores - all on the same machine, same 
> config, same workload. We don't see any apparent reason for these peaks. 

Interesting.  Maybe you could use oprofile to try to see what's
happening?  It sounds a bit like momentary contention for a spinlock,
but exactly what isn't clear.

> Can this be connected with our heavy use of listen/notify and hundreds 
> backends in listen mode?

Perhaps.  Have you tried logging executions of NOTIFY to see if they are
correlated with the spikes?

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Linux/PostgreSQL scalability issue - problem with 8 cores

2008-01-03 Thread Jakub Ouhrabka

Hi Tom,

> Interesting.  Maybe you could use oprofile to try to see what's
> happening?  It sounds a bit like momentary contention for a spinlock,
> but exactly what isn't clear.

ok, we're going to try oprofile, will let you know...

> Perhaps.  Have you tried logging executions of NOTIFY to see if they
> are correlated with the spikes?

We didn't log the notifies but I think it's not correlated. We'll have a 
detailed look next time we try it (with oprofile).


Thanks for suggestions!

Kuba


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] Linux/PostgreSQL scalability issue - problem with 8 cores

2008-01-03 Thread Jakub Ouhrabka

Hi Sven,

> I guess all backends do listen to the same notification.

Unfortunatelly no. The backends are listening to different notifications 
in different databases. Usually there are only few listens per database 
with only one exception - there are many (hundreds) listens in one 
database but all for different notifications.


> Can you change your implementation?
> - split you problem - create multiple notification if possible

Yes, it is like this.

> - do an UNLISTEN if possible

Yes, we're issuing unlistens when appropriate.

> - use another signalisation technique

We're planning to reduce the number of databases/backends/listens but 
anyway we'd like to run system on 8 cores if it is running without any 
problems on 2 cores...


Thanks for the suggestions!

Kuba

Sven Geisler napsal(a):

Hi Jakub,

I do have a similar server (from DELL), which performance well with our
PostgreSQL application. I guess the peak in context switches is the only
think you can see.

Anyhow, I think it is you're LISTEN/NOTIFY approach which cause that
behaviour. I guess all backends do listen to the same notification.
I don't know the exact implementation, but I can imagine that all
backends are access the same section in the shared memory which cause
the increase of context switches. More cores means more access at the
same time.

Can you change your implementation?
- split you problem - create multiple notification if possible
- do an UNLISTEN if possible
- use another signalisation technique

Regards
Sven


Jakub Ouhrabka schrieb:

Hi all,

we have a PostgreSQL dedicated Linux server with 8 cores (2xX5355). We
came accross a strange issue: when running with all 8 cores enabled
approximatly once a minute (period differs) the system is very busy for
a few seconds (~5-10s) and we don't know why - this issue don't show up
when we tell Linux to use only 2 cores, with 4 cores the problem is here
but it is still better than with 8 cores - all on the same machine, same
config, same workload. We don't see any apparent reason for these peaks.
We'd like to investigate it further but we don't know what to try next.
Any suggenstions? Any tunning tips for Linux+PostgreSQL on 8-way system?
Can this be connected with our heavy use of listen/notify and hundreds
backends in listen mode?

More details are below.

Thanks,

Kuba

System: HP DL360 2x5355, 8G RAM, P600+MSA50 - internal 2x72GB RAID 10
for OS, 10x72G disks RAID 10 for PostgreSQL data and wal
OS: Linux 2.6 64bit (kernel 2.6.21, 22, 23 makes little difference)
PostgreSQL: 8.2.4 (64bit), shared buffers 1G

Nothing else than PostgreSQL is running on the server. Cca 800
concurrent backends. Majority of backends in LISTEN doing nothing.
Client interface for most backends is ecpg+libpq.

Problem description:

The system is usually running 80-95% idle. Approximatly once a minute
for cca 5-10s there is a peak in activity which looks like this:

vmstat (and top or atop) reports 0% idle, 100% in user mode, very low
iowait, low IO activity, higher number of contex switches than usual but
not exceedingly high (2000-4000cs/s, usually 1500cs/s), few hundreds
waiting processes per second (usually 0-1/s). From looking at top and
running processes we can't see any obvious reason for the peak.
According to PostgreSQL log the long running commands from these moments
are e.g. begin transaction lasting several seconds.

When only 2 cores are enabled (kernel command line) then everything is
running smoothly. 4 cores exibits slightly better behavior than 8 cores
but worse than 2 cores - the peaks are visible.

We've tried kernel versions 2.6.21-23 (latest revisions as of beginning
December from kernel.org) the pattern slightly changed but it may also
be that the workload slightly changed.

pgbench or any other stress testing runs smoothly on the server.

The o usage panly strange thing about ourttern I can think of is heavy
use of LISTEN/NOTIFY especially hunderds backends in listen mode.

When restarting our connected clients the peaks are not there from time
0, they are visible after a while - seems something gets synchronized
and causing troubles then.

Since the server is PostgreSQL dedicated and no our client applications
are running on it - and there is a difference when 2 and 8 cores are
enabled -  we think that the peaks are not caused by our client
applications.

How can we diagnose what is happening during the peaks?

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Linux/PostgreSQL scalability issue - problem with 8 cores

2008-01-03 Thread Alvaro Herrera
Jakub Ouhrabka wrote:

> > - do an UNLISTEN if possible
>
> Yes, we're issuing unlistens when appropriate.

You are vacuuming pg_listener periodically, yes?  Not that this seems to
have any relationship to your problem, but ...

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] Linux/PostgreSQL scalability issue - problem with 8 cores

2008-01-03 Thread Jakub Ouhrabka

Alvaro,

>>> - do an UNLISTEN if possible
>> Yes, we're issuing unlistens when appropriate.
>
> You are vacuuming pg_listener periodically, yes?  Not that this seems
> to have any relationship to your problem, but ...

yes, autovacuum should take care of this. But looking forward for 
multiple-workers in 8.3 as it should help us during high load periods 
(some tables might wait too long for autovacuum now - but it's not that 
big problem for us...).


Thanks for great work!

Kuba

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org