Re: [PERFORM] Postgres Replaying WAL slowly

2014-09-17 Thread Simon Riggs
On 1 July 2014 20:20, Tom Lane t...@sss.pgh.pa.us wrote:
 Jeff Frost j...@pgexperts.com writes:
 On Jun 30, 2014, at 4:04 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Did you check whether the locks were all on temp tables of the
 ON COMMIT DROP persuasion?

 And indeed it did catch up overnight and the lag increased shortly after a 
 correlating spike in AccessExclusiveLocks that were generated by temp table 
 creation with on commit drop.

 OK, so we have a pretty clear idea of where the problem is now.

 It seems like there are three, not mutually exclusive, ways we might
 address this:

 1. Local revisions inside StandbyReleaseLocks to make it perform better in
 the presence of many locks.  This would only be likely to improve matters
 much if there's a fixable O(N^2) algorithmic issue; but there might well
 be one.

 2. Avoid WAL-logging AccessExclusiveLocks associated with temp tables, on
 the grounds that no standby should be touching them.  I'm not entirely
 sure that that argument is bulletproof though; in particular, even though
 a standby couldn't access the table's data, it's possible that it would be
 interested in seeing consistent catalog entries.

 3. Avoid WAL-logging AccessExclusiveLocks associated with
 new-in-transaction tables, temp or not, on the grounds that no standby
 could even see such tables until they're committed.  We could go a bit
 further and not take out any locks on a new-in-transaction table in the
 first place, on the grounds that other transactions on the master can't
 see 'em either.

 It sounded like Andres had taken a preliminary look at #1 and found a
 possible avenue for improvement, which I'd encourage him to pursue.

 For both #2 and the conservative version of #3, the main implementation
 problem would be whether the lock WAL-logging code has cheap access to
 the necessary information.  I suspect it doesn't.

 The radical version of #3 might be pretty easy to do, at least to the
 extent of removing locks taken out during CREATE TABLE.  I suspect there
 are some assertions or other consistency checks that would get unhappy if
 we manipulate relations without locks, though, so those would have to be
 taught about the exception.  Also, we sometimes forget new-in-transaction
 status during relcache flush events; it's not clear if that would be a
 problem for this.

 I don't plan to work on this myself, but perhaps someone with more
 motivation will want to run with these ideas.

Patch implements option 2 in the above.

Skipping the locks entirely seems like it opens a can of worms.

Skipping the lock for temp tables is valid since locks don't need to
exist on the standby. Any catalog entries for them will exist, but the
rows will show them as temp and nobody would expect them to be valid
outside of the original session.

Patch implements a special case that takes the lock normally, but
skips WAL logging the lock info.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


temp_tables_skip_logging_locks.v1.patch
Description: Binary data

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres Replaying WAL slowly

2014-07-03 Thread Simon Riggs
On 1 July 2014 20:20, Tom Lane t...@sss.pgh.pa.us wrote:

 I don't plan to work on this myself, but perhaps someone with more
 motivation will want to run with these ideas.

I was planning to work on improving performance of replication apply
over the summer, mid July - Aug, so I'll add this to the list.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres Replaying WAL slowly

2014-07-02 Thread Andres Freund
On 2014-07-01 15:20:37 -0400, Tom Lane wrote:
 Jeff Frost j...@pgexperts.com writes:
  On Jun 30, 2014, at 4:04 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Did you check whether the locks were all on temp tables of the
  ON COMMIT DROP persuasion?
 
  And indeed it did catch up overnight and the lag increased shortly after a 
  correlating spike in AccessExclusiveLocks that were generated by temp table 
  creation with on commit drop.
 
 OK, so we have a pretty clear idea of where the problem is now.
 
 It seems like there are three, not mutually exclusive, ways we might
 address this:
 
 1. Local revisions inside StandbyReleaseLocks to make it perform better in
 the presence of many locks.  This would only be likely to improve matters
 much if there's a fixable O(N^2) algorithmic issue; but there might well
 be one.

 It sounded like Andres had taken a preliminary look at #1 and found a
 possible avenue for improvement, which I'd encourage him to pursue.
 

I don't have the resources to do this right now, but yes, I think we can
get relatively easily get rid of the O(num_locks * num_subtransactions)
behaviour.

 2. Avoid WAL-logging AccessExclusiveLocks associated with temp tables, on
 the grounds that no standby should be touching them.  I'm not entirely
 sure that that argument is bulletproof though; in particular, even though
 a standby couldn't access the table's data, it's possible that it would be
 interested in seeing consistent catalog entries.

Hm. We definitely perform checks surprisingly late for those. It's
possible to do SELECT * FROM pg_temp_nn.whatever; without an error f
there's no rows of if the rest of the plan doesn't do accesses to that
table. The check prohibiting access is only in bufmgr.c...
So yea, I don't think we can do this for at least  9.4. And there
it'll still be hard.

 3. Avoid WAL-logging AccessExclusiveLocks associated with
 new-in-transaction tables, temp or not, on the grounds that no standby
 could even see such tables until they're committed.  We could go a bit
 further and not take out any locks on a new-in-transaction table in the
 first place, on the grounds that other transactions on the master can't
 see 'em either.
 
 For both #2 and the conservative version of #3, the main implementation
 problem would be whether the lock WAL-logging code has cheap access to
 the necessary information.  I suspect it doesn't.

Not trivially. It's logged directly in LockAcquireExtended(). We could
add the information into locktags as there's unused fields for relation
locktags, but brrr.

 The radical version of #3 might be pretty easy to do, at least to the
 extent of removing locks taken out during CREATE TABLE.  I suspect there
 are some assertions or other consistency checks that would get unhappy if
 we manipulate relations without locks, though, so those would have to be
 taught about the exception.

 Also, we sometimes forget new-in-transaction
 status during relcache flush events; it's not clear if that would be a
 problem for this.

I think that hole is actually pluggable in newer releases - at least
there's no code around that assumes rd_createSubid now is persistent,
even across cache resets.

But I think more importantly it's probably quite possible to hit a
similar problem without ON COMMIT DROP relations. Say DISCARD TEMP
inside a transaction (with several subxacts) or so? So we probaly really
should fix the bad scaling.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres Replaying WAL slowly

2014-07-02 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2014-07-01 15:20:37 -0400, Tom Lane wrote:
 It seems like there are three, not mutually exclusive, ways we might
 address this:

 But I think more importantly it's probably quite possible to hit a
 similar problem without ON COMMIT DROP relations. Say DISCARD TEMP
 inside a transaction (with several subxacts) or so? So we probaly really
 should fix the bad scaling.

Well, my thought was that these approaches would address somewhat
different sets of use-cases, and we might well want to do more than one.
Even if StandbyReleaseLocks were zero-cost, not emitting the WAL in the
first place is surely considerably cheaper yet.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres Replaying WAL slowly

2014-07-01 Thread Jeff Frost
On Jun 30, 2014, at 4:57 PM, Jeff Frost j...@pgexperts.com wrote:

 
 On Jun 30, 2014, at 4:04 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 
 Ah ... that's more like a number I can believe something would have
 trouble coping with.  Did you see a noticeable slowdown with this?
 Now that we've seen that number, of course it's possible there was an
 even higher peak occurring when you saw the trouble.
 
 Perhaps there's an O(N^2) behavior in StandbyReleaseLocks, or maybe
 it just takes awhile to handle that many locks.
 
 Did you check whether the locks were all on temp tables of the
 ON COMMIT DROP persuasion?
 
 
 Unfortunately not, because I went for a poor man's: SELECT count(*) FROM 
 pg_locks WHERE mode = 'AccessExclusiveLock' 
 run in cron every minute.
 
 That said, I'd bet it was mostly ON COMMIT DROP temp tables.
 
 The unfortunate thing is I wouldn't know how to correlate that spike with the 
 corresponding slowdown because the replica is about 5.5hrs lagged at the 
 moment.
 
 Hopefully it will get caught up tonight and we can see if there's a 
 correlation tomorrow.

And indeed it did catch up overnight and the lag increased shortly after a 
correlating spike in AccessExclusiveLocks that were generated by temp table 
creation with on commit drop.




Re: [PERFORM] Postgres Replaying WAL slowly

2014-07-01 Thread Tom Lane
Jeff Frost j...@pgexperts.com writes:
 On Jun 30, 2014, at 4:04 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Did you check whether the locks were all on temp tables of the
 ON COMMIT DROP persuasion?

 And indeed it did catch up overnight and the lag increased shortly after a 
 correlating spike in AccessExclusiveLocks that were generated by temp table 
 creation with on commit drop.

OK, so we have a pretty clear idea of where the problem is now.

It seems like there are three, not mutually exclusive, ways we might
address this:

1. Local revisions inside StandbyReleaseLocks to make it perform better in
the presence of many locks.  This would only be likely to improve matters
much if there's a fixable O(N^2) algorithmic issue; but there might well
be one.

2. Avoid WAL-logging AccessExclusiveLocks associated with temp tables, on
the grounds that no standby should be touching them.  I'm not entirely
sure that that argument is bulletproof though; in particular, even though
a standby couldn't access the table's data, it's possible that it would be
interested in seeing consistent catalog entries.

3. Avoid WAL-logging AccessExclusiveLocks associated with
new-in-transaction tables, temp or not, on the grounds that no standby
could even see such tables until they're committed.  We could go a bit
further and not take out any locks on a new-in-transaction table in the
first place, on the grounds that other transactions on the master can't
see 'em either.

It sounded like Andres had taken a preliminary look at #1 and found a
possible avenue for improvement, which I'd encourage him to pursue.

For both #2 and the conservative version of #3, the main implementation
problem would be whether the lock WAL-logging code has cheap access to
the necessary information.  I suspect it doesn't.

The radical version of #3 might be pretty easy to do, at least to the
extent of removing locks taken out during CREATE TABLE.  I suspect there
are some assertions or other consistency checks that would get unhappy if
we manipulate relations without locks, though, so those would have to be
taught about the exception.  Also, we sometimes forget new-in-transaction
status during relcache flush events; it's not clear if that would be a
problem for this.

I don't plan to work on this myself, but perhaps someone with more
motivation will want to run with these ideas.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Heikki Linnakangas

On 06/29/2014 03:43 PM, Soni M wrote:

top and sar says 100% cpu usage of one core, no sign of I/O wait.


Hmm, I wonder what it's doing then... If you have perf installed on 
the system, you can do perf top to get a quick overlook of where the 
CPU time is spent.


- Heikki



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Soni M
Here's what 'perf top' said on streaming replica :

Samples: 26K of event 'cpu-clock', Event count (approx.): 19781
 95.97%  postgres [.] 0x002210f3
  0.41%  perf [.] 0x0005f225
  0.39%  libc-2.12.so [.] __strstr_sse2
  0.22%  libc-2.12.so [.] memchr
  0.22%  [kernel] [k] kallsyms_expand_symbol
  0.18%  perf [.] symbols__insert
  0.18%  [kernel] [k] format_decode
  0.15%  libc-2.12.so [.] __GI___strcmp_ssse3
  0.13%  [kernel] [k] string
  0.12%  [kernel] [k] number
  0.12%  [kernel] [k] vsnprintf
  0.12%  libc-2.12.so [.] _IO_vfscanf
  0.11%  perf [.] dso__find_symbol
  0.11%  [kernel] [k] _spin_unlock_irqrestore
  0.10%  perf [.] hex2u64
  0.10%  postgres [.]
hash_search_with_hash_value
  0.09%  perf [.] rb_next
  0.08%  libc-2.12.so [.] memcpy
  0.07%  libc-2.12.so [.] __strchr_sse2
  0.07%  [kernel] [k] clear_page
  0.06%  [kernel] [k] strnlen
  0.05%  perf [.] perf_evsel__parse_sample
  0.05%  perf [.] rb_insert_color
  0.05%  [kernel] [k] pointer



On Mon, Jun 30, 2014 at 2:05 PM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 On 06/29/2014 03:43 PM, Soni M wrote:

 top and sar says 100% cpu usage of one core, no sign of I/O wait.


 Hmm, I wonder what it's doing then... If you have perf installed on the
 system, you can do perf top to get a quick overlook of where the CPU time
 is spent.

 - Heikki




-- 
Regards,

Soni Maula Harriz


Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Heikki Linnakangas

On 06/30/2014 05:46 PM, Soni M wrote:

Here's what 'perf top' said on streaming replica :

Samples: 26K of event 'cpu-clock', Event count (approx.): 19781
  95.97%  postgres [.] 0x002210f3


Ok, so it's stuck doing something.. Can you get build with debug symbols 
installed, so that we could see the function name?

- Heikki



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Jeff Frost

On Jun 30, 2014, at 9:14 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote:

 On 06/30/2014 05:46 PM, Soni M wrote:
 Here's what 'perf top' said on streaming replica :
 
 Samples: 26K of event 'cpu-clock', Event count (approx.): 19781
  95.97%  postgres [.] 0x002210f3
 
 Ok, so it's stuck doing something.. Can you get build with debug symbols 
 installed, so that we could see the function name?
 - Heikki
 

Looks like StandbyReleaseLocks:

Samples: 10K of event 'cpu-clock', Event count (approx.): 8507
 89.21%  postgres  [.] StandbyReleaseLocks
  0.89%  libc-2.12.so  [.] __strstr_sse2
  0.83%  perf  [.] 0x0005f1e5
  0.74%  [kernel]  [k] kallsyms_expand_symbol
  0.52%  libc-2.12.so  [.] memchr
  0.47%  perf  [.] symbols__insert
  0.47%  [kernel]  [k] format_decode

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Andres Freund
On 2014-06-30 19:14:24 +0300, Heikki Linnakangas wrote:
 On 06/30/2014 05:46 PM, Soni M wrote:
 Here's what 'perf top' said on streaming replica :
 
 Samples: 26K of event 'cpu-clock', Event count (approx.): 19781
   95.97%  postgres [.] 0x002210f3
 
 Ok, so it's stuck doing something.. Can you get build with debug symbols
 installed, so that we could see the function name?

My guess it's a spinlock, probably xlogctl-info_lck via
RecoveryInProgress(). Unfortunately inline assembler doesn't always seem
to show up correctly in profiles...

What worked for me was to build with -fno-omit-frame-pointer - that
normally shows the callers, even if it can't generate a proper symbol
name.

Soni: Do you use Hot Standby? Are there connections active while you
have that problem? Any other processes with high cpu load?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Soni M
On Tue, Jul 1, 2014 at 12:14 AM, Andres Freund and...@2ndquadrant.com
wrote:


 My guess it's a spinlock, probably xlogctl-info_lck via
 RecoveryInProgress(). Unfortunately inline assembler doesn't always seem
 to show up correctly in profiles...

 What worked for me was to build with -fno-omit-frame-pointer - that
 normally shows the callers, even if it can't generate a proper symbol
 name.

 Soni: Do you use Hot Standby? Are there connections active while you
 have that problem? Any other processes with high cpu load?

 Greetings,

 Andres Freund

 --
  Andres Freund http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services


It is
 96.62%  postgres [.] StandbyReleaseLocks
 as Jeff said. It runs quite long time, more than 5 minutes i think

i also use hot standby. we have 4 streaming replica, some of them has
active connection some has not. this issue has last more than 4 days. On
one of the standby, above postgres process is the only process that consume
high cpu load.

-- 
Regards,

Soni Maula Harriz


Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Jeff Frost
On Jun 30, 2014, at 10:29 AM, Soni M diptat...@gmail.com wrote:

 
 
 
 On Tue, Jul 1, 2014 at 12:14 AM, Andres Freund and...@2ndquadrant.com wrote:
 
 My guess it's a spinlock, probably xlogctl-info_lck via
 RecoveryInProgress(). Unfortunately inline assembler doesn't always seem
 to show up correctly in profiles...
 
 What worked for me was to build with -fno-omit-frame-pointer - that
 normally shows the callers, even if it can't generate a proper symbol
 name.
 
 Soni: Do you use Hot Standby? Are there connections active while you
 have that problem? Any other processes with high cpu load?
 
 Greetings,
 
 Andres Freund
 
 --
  Andres Freund http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services
 
 It is 
  96.62%  postgres [.] StandbyReleaseLocks
  as Jeff said. It runs quite long time, more than 5 minutes i think
 
 i also use hot standby. we have 4 streaming replica, some of them has active 
 connection some has not. this issue has last more than 4 days. On one of the 
 standby, above postgres process is the only process that consume high cpu 
 load.

compiled with -fno-omit-frame-pointer doesn't yield much more info:

 76.24%  postgres   [.] StandbyReleaseLocks
  2.64%  libcrypto.so.1.0.1e[.] md5_block_asm_data_order
  2.19%  libcrypto.so.1.0.1e[.] RC4
  2.17%  postgres   [.] RecordIsValid
  1.20%  [kernel]   [k] copy_user_generic_unrolled
  1.18%  [kernel]   [k] _spin_unlock_irqrestore
  0.97%  [vmxnet3]  [k] vmxnet3_poll_rx_only
  0.87%  [kernel]   [k] __do_softirq
  0.77%  [vmxnet3]  [k] vmxnet3_xmit_frame
  0.69%  postgres   [.] hash_search_with_hash_value
  0.68%  [kernel]   [k] fin

However, this server started progressing through the WAL files quite a bit 
better before I finished compiling, so we'll leave it running with this version 
and see if there's more info available the next time it starts replaying slowly.




Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Andres Freund
On 2014-06-30 11:34:52 -0700, Jeff Frost wrote:
 On Jun 30, 2014, at 10:29 AM, Soni M diptat...@gmail.com wrote:

  It is 
   96.62%  postgres [.] StandbyReleaseLocks
   as Jeff said. It runs quite long time, more than 5 minutes i think
  
  i also use hot standby. we have 4 streaming replica, some of them has 
  active connection some has not. this issue has last more than 4 days. On 
  one of the standby, above postgres process is the only process that consume 
  high cpu load.
 
 compiled with -fno-omit-frame-pointer doesn't yield much more info:

You'd need to do perf record -ga instead of perf record -a to see
additional information.

But:

  76.24%  postgres   [.] StandbyReleaseLocks

already is quite helpful.

What are you doing on that system? Is there anything requiring large
amounts of access exclusive locks on the primary? Possibly large amounts
of temporary relations?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Pavel Stehule
2014-06-30 20:34 GMT+02:00 Jeff Frost j...@pgexperts.com:

 On Jun 30, 2014, at 10:29 AM, Soni M diptat...@gmail.com wrote:




 On Tue, Jul 1, 2014 at 12:14 AM, Andres Freund and...@2ndquadrant.com
 wrote:


 My guess it's a spinlock, probably xlogctl-info_lck via
 RecoveryInProgress(). Unfortunately inline assembler doesn't always seem
 to show up correctly in profiles...


For this kind of issues a systemtap or dtrace can be useful

http://postgres.cz/wiki/Monitorov%C3%A1n%C3%AD_lwlocku_pomoc%C3%AD_systemtapu

you can identify what locking is a problem - please, use a google translate

Regards

Pavel


  What worked for me was to build with -fno-omit-frame-pointer - that
 normally shows the callers, even if it can't generate a proper symbol
 name.

 Soni: Do you use Hot Standby? Are there connections active while you
 have that problem? Any other processes with high cpu load?

 Greetings,

 Andres Freund

 --
  Andres Freund http://www.2ndQuadrant.com/
 http://www.2ndquadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services


 It is
  96.62%  postgres [.] StandbyReleaseLocks
  as Jeff said. It runs quite long time, more than 5 minutes i think

 i also use hot standby. we have 4 streaming replica, some of them has
 active connection some has not. this issue has last more than 4 days. On
 one of the standby, above postgres process is the only process that consume
 high cpu load.


 compiled with -fno-omit-frame-pointer doesn't yield much more info:

  76.24%  postgres   [.] StandbyReleaseLocks
   2.64%  libcrypto.so.1.0.1e[.]
 md5_block_asm_data_order
   2.19%  libcrypto.so.1.0.1e[.] RC4
   2.17%  postgres   [.] RecordIsValid
   1.20%  [kernel]   [k]
 copy_user_generic_unrolled
   1.18%  [kernel]   [k] _spin_unlock_irqrestore
   0.97%  [vmxnet3]  [k] vmxnet3_poll_rx_only
   0.87%  [kernel]   [k] __do_softirq
   0.77%  [vmxnet3]  [k] vmxnet3_xmit_frame
   0.69%  postgres   [.]
 hash_search_with_hash_value
   0.68%  [kernel]   [k] fin

 However, this server started progressing through the WAL files quite a bit
 better before I finished compiling, so we'll leave it running with this
 version and see if there's more info available the next time it starts
 replaying slowly.





Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Jeff Frost
On Jun 30, 2014, at 11:39 AM, Andres Freund and...@2ndquadrant.com wrote:

 On 2014-06-30 11:34:52 -0700, Jeff Frost wrote:
 On Jun 30, 2014, at 10:29 AM, Soni M diptat...@gmail.com wrote:
 
 It is 
 96.62%  postgres [.] StandbyReleaseLocks
 as Jeff said. It runs quite long time, more than 5 minutes i think
 
 i also use hot standby. we have 4 streaming replica, some of them has 
 active connection some has not. this issue has last more than 4 days. On 
 one of the standby, above postgres process is the only process that consume 
 high cpu load.
 
 compiled with -fno-omit-frame-pointer doesn't yield much more info:
 
 You'd need to do perf record -ga instead of perf record -a to see
 additional information.
 

Ah! That's right.

Here's how that looks:

Samples: 473K of event 'cpu-clock', Event count (approx.): 473738
+  68.42%  init  [kernel.kallsyms] [k] 
native_safe_halt
+  26.07%  postgres  postgres  [.] 
StandbyReleaseLocks
+   2.82%   swapper  [kernel.kallsyms] [k] 
native_safe_halt
+   0.19%   ssh  libcrypto.so.1.0.1e   [.] 
md5_block_asm_data_order
+   0.19%  postgres  postgres  [.] 
RecordIsValid
+   0.16%   ssh  libcrypto.so.1.0.1e   [.] RC4
+   0.10%  postgres  postgres  [.] 
hash_search_with_hash_value
+   0.06%  postgres  [kernel.kallsyms] [k] 
_spin_unlock_irqrestore
+   0.05%  init  [vmxnet3] [k] 
vmxnet3_poll_rx_only
+   0.04%  postgres  [kernel.kallsyms] [k] 
copy_user_generic_unrolled
+   0.04%  init  [kernel.kallsyms] [k] 
finish_task_switch
+   0.04%  init  [kernel.kallsyms] [k] 
__do_softirq
+   0.04%   ssh  [kernel.kallsyms] [k] 
_spin_unlock_irqrestore
+   0.04%   ssh  [vmxnet3] [k] 
vmxnet3_xmit_frame
+   0.03%  postgres  postgres  [.] PinBuffer
+   0.03%  init  [vmxnet3] [k] 
vmxnet3_xmit_frame
+   0.03%   ssh  [kernel.kallsyms] [k] 
copy_user_generic_unrolled
+   0.03%  postgres  postgres  [.] 
XLogReadBufferExtended
+   0.03%   ssh  ssh   [.] 
0x0002aa07
+   0.03%  init  [kernel.kallsyms] [k] 
_spin_unlock_irqrestore
+   0.03%   ssh  [vmxnet3] [k] 
vmxnet3_poll_rx_only
+   0.02%   ssh  [kernel.kallsyms] [k] 
__do_softirq
+   0.02%  postgres  libc-2.12.so  [.] 
_wordcopy_bwd_dest_aligned
+   0.02%  postgres  postgres  [.] mdnblocks
+   0.02%   ssh  libcrypto.so.1.0.1e   [.] 
0x000e25a1
+   0.02%   scp  [kernel.kallsyms] [k] 
copy_user_generic_unrolled
+   0.02%   ssh  libc-2.12.so  [.] memcpy
+   0.02%  postgres  libc-2.12.so  [.] memcpy


 But:
 
 76.24%  postgres   [.] StandbyReleaseLocks
 
 already is quite helpful.
 
 What are you doing on that system? Is there anything requiring large
 amounts of access exclusive locks on the primary? Possibly large amounts
 of temporary relations?


The last time we did a 100% logging run, the peak temp table creation was 
something like 120k/hr, but the replicas seemed able to keep up with that just 
fine.

Hopefully Soni can answer whether that has increased significantly since May.



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Jeff Frost
On Jun 30, 2014, at 12:17 PM, Jeff Frost j...@pgexperts.com wrote:

 
 already is quite helpful.
 
 What are you doing on that system? Is there anything requiring large
 amounts of access exclusive locks on the primary? Possibly large amounts
 of temporary relations?
 
 
 The last time we did a 100% logging run, the peak temp table creation was 
 something like 120k/hr, but the replicas seemed able to keep up with that 
 just fine.
 

Sampling pg_locks on the primary shows ~50 locks with ExclusiveLock mode:

   mode   | count
--+---
 AccessExclusiveLock  |11
 AccessShareLock  |  2089
 ExclusiveLock|46
 RowExclusiveLock |81
 RowShareLock |17
 ShareLock| 4
 ShareUpdateExclusiveLock | 5

Seems to be relatively consistent.  Of course, it's hard to say what it looked 
like back when the issue began.





Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Tom Lane
Jeff Frost j...@pgexperts.com writes:
 Sampling pg_locks on the primary shows ~50 locks with ExclusiveLock mode:

mode   | count
 --+---
  AccessExclusiveLock  |11
  AccessShareLock  |  2089
  ExclusiveLock|46
  RowExclusiveLock |81
  RowShareLock |17
  ShareLock| 4
  ShareUpdateExclusiveLock | 5

That's not too helpful if you don't pay attention to what the lock is on;
it's likely that all the ExclusiveLocks are on transactions' own XIDs,
which isn't relevant to the standby's behavior.  The AccessExclusiveLocks
are probably interesting though --- you should look to see what those
are on.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Jeff Frost

On Jun 30, 2014, at 12:32 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Jeff Frost j...@pgexperts.com writes:
 Sampling pg_locks on the primary shows ~50 locks with ExclusiveLock mode:
 
   mode   | count
 --+---
 AccessExclusiveLock  |11
 AccessShareLock  |  2089
 ExclusiveLock|46
 RowExclusiveLock |81
 RowShareLock |17
 ShareLock| 4
 ShareUpdateExclusiveLock | 5
 
 That's not too helpful if you don't pay attention to what the lock is on;
 it's likely that all the ExclusiveLocks are on transactions' own XIDs,
 which isn't relevant to the standby's behavior.  The AccessExclusiveLocks
 are probably interesting though --- you should look to see what those
 are on.

You're right about the ExclusiveLocks.

Here's how the AccessExclusiveLocks look:

 locktype | database |  relation  | page | tuple | virtualxid | transactionid | 
classid |   objid| objsubid | virtualtransaction |  pid  |mode  
   | granted
--+--++--+---++---+-++--++---+-+-
 relation |   111285 | 3245291551 |  |   ||   | 
||  | 233/170813 | 23509 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292820 |  |   ||   | 
||  | 5/22498235 | 23427 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292833 |  |   ||   | 
||  | 173/1723993| 23407 | 
AccessExclusiveLock | t
 relation |   111285 | 3245287874 |  |   ||   | 
||  | 133/3818415| 23348 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292836 |  |   ||   | 
||  | 173/1723993| 23407 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292774 |  |   ||   | 
||  | 5/22498235 | 23427 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292734 |  |   ||   | 
||  | 5/22498235 | 23427 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292827 |  |   ||   | 
||  | 173/1723993| 23407 | 
AccessExclusiveLock | t
 relation |   111285 | 3245288540 |  |   ||   | 
||  | 133/3818415| 23348 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292773 |  |   ||   | 
||  | 5/22498235 | 23427 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292775 |  |   ||   | 
||  | 5/22498235 | 23427 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292743 |  |   ||   | 
||  | 5/22498235 | 23427 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292751 |  |   ||   | 
||  | 5/22498235 | 23427 | 
AccessExclusiveLock | t
 relation |   111285 | 3245288669 |  |   ||   | 
||  | 133/3818415| 23348 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292817 |  |   ||   | 
||  | 5/22498235 | 23427 | 
AccessExclusiveLock | t
 relation |   111285 | 3245288657 |  |   ||   | 
||  | 133/3818415| 23348 | 
AccessExclusiveLock | t
 object   |   111285 ||  |   ||   | 
   2615 | 1246019760 |0 | 233/170813 | 23509 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292746 |  |   ||   | 
||  | 5/22498235 | 23427 | 
AccessExclusiveLock | t
 relation |   111285 | 3245287876 |  |   ||   | 
||  | 133/3818415| 23348 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292739 |  |   ||   | 
||  | 5/22498235 | 23427 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292826 |  |   ||   | 
||  | 5/22498235 | 23427 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292825 |  |   ||   | 
||  | 5/22498235 | 23427 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292832 |  |   |   

Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Matheus de Oliveira
On Mon, Jun 30, 2014 at 4:42 PM, Jeff Frost j...@pgexperts.com wrote:

 And if you go fishing in pg_class for any of the oids, you don't find
 anything:


That is probably because you are connected in the wrong database. Once you
connect to the database of interest, you don't even need to query pg_class,
just cast relation attribute to regclass:

SELECT relation::regclass, ...
FROM pg_locks WHERE database = (SELECT oid FROM pg_database WHERE
datname = current_database());

Regards,
-- 
Matheus de Oliveira
Analista de Banco de Dados
Dextra Sistemas - MPS.Br nĂ­vel F!
www.dextra.com.br/postgres


Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Jeff Frost

On Jun 30, 2014, at 12:54 PM, Matheus de Oliveira matioli.math...@gmail.com 
wrote:

 
 On Mon, Jun 30, 2014 at 4:42 PM, Jeff Frost j...@pgexperts.com wrote:
 And if you go fishing in pg_class for any of the oids, you don't find 
 anything:
 
 That is probably because you are connected in the wrong database. Once you 
 connect to the database of interest, you don't even need to query pg_class, 
 just cast relation attribute to regclass:
 
 SELECT relation::regclass, ...
 FROM pg_locks WHERE database = (SELECT oid FROM pg_database WHERE datname 
 = current_database());
 

Yah, i thought about that too, but verified I am in the correct DB.  Just for 
clarity sake:

SELECT relation::regclass
FROM pg_locks WHERE database = (SELECT oid FROM pg_database WHERE datname = 
current_database()) and mode = 'AccessExclusiveLock';

  relation



 3245508214
 3245508273
 3245508272
 3245508257
 3245508469
 3245508274
 3245508373
 3245508468
 3245508210
 3245508463
 3245508205
 3245508260
 3245508265
 3245508434
(16 rows)

Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Andres Freund
On 2014-06-30 12:57:56 -0700, Jeff Frost wrote:
 
 On Jun 30, 2014, at 12:54 PM, Matheus de Oliveira matioli.math...@gmail.com 
 wrote:
 
  
  On Mon, Jun 30, 2014 at 4:42 PM, Jeff Frost j...@pgexperts.com wrote:
  And if you go fishing in pg_class for any of the oids, you don't find 
  anything:
  
  That is probably because you are connected in the wrong database. Once you 
  connect to the database of interest, you don't even need to query pg_class, 
  just cast relation attribute to regclass:
  
  SELECT relation::regclass, ...
  FROM pg_locks WHERE database = (SELECT oid FROM pg_database WHERE 
  datname = current_database());
  
 
 Yah, i thought about that too, but verified I am in the correct DB.  Just for 
 clarity sake:

So these are probably relations created in uncommitted
transactions. Possibly ON COMMIT DROP temp tables?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Jeff Frost

On Jun 30, 2014, at 1:15 PM, Andres Freund and...@2ndquadrant.com wrote:

 On 2014-06-30 12:57:56 -0700, Jeff Frost wrote:
 
 On Jun 30, 2014, at 12:54 PM, Matheus de Oliveira 
 matioli.math...@gmail.com wrote:
 
 
 On Mon, Jun 30, 2014 at 4:42 PM, Jeff Frost j...@pgexperts.com wrote:
 And if you go fishing in pg_class for any of the oids, you don't find 
 anything:
 
 That is probably because you are connected in the wrong database. Once you 
 connect to the database of interest, you don't even need to query pg_class, 
 just cast relation attribute to regclass:
 
SELECT relation::regclass, ...
FROM pg_locks WHERE database = (SELECT oid FROM pg_database WHERE 
 datname = current_database());
 
 
 Yah, i thought about that too, but verified I am in the correct DB.  Just 
 for clarity sake:
 
 So these are probably relations created in uncommitted
 transactions. Possibly ON COMMIT DROP temp tables?


That would make sense.  There are definitely quite a few of those being used.

Another item of note is the system catalogs are quite bloated:

 schemaname |  tablename   | tbloat | wastedmb | idxbloat | wastedidxmb
+--++--+--+-
 pg_catalog | pg_attribute |   3945 |   106.51 | 2770 |  611.28
 pg_catalog | pg_class |   8940 |45.26 | 4420 |   47.89
 pg_catalog | pg_type  |   4921 |18.45 | 5850 |   81.16
 pg_catalog | pg_depend|933 |10.23 |11730 |  274.37
 pg_catalog | pg_index |   3429 | 8.36 | 3920 |   24.24
 pg_catalog | pg_shdepend  |983 | 2.67 | 9360 |   30.56
(6 rows)

Would that cause the replica to spin on StandbyReleaseLocks?




Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Tom Lane
Jeff Frost j...@pgexperts.com writes:
 On Jun 30, 2014, at 1:15 PM, Andres Freund and...@2ndquadrant.com wrote:
 So these are probably relations created in uncommitted
 transactions. Possibly ON COMMIT DROP temp tables?

 That would make sense.  There are definitely quite a few of those being used.

Uh-huh.  I doubt that the mechanism that handles propagation of
AccessExclusiveLocks to the standby is smart enough to ignore locks
on temp tables :-(

 Another item of note is the system catalogs are quite bloated:
 Would that cause the replica to spin on StandbyReleaseLocks?

AFAIK, no.  It's an unsurprising consequence of heavy use of short-lived
temp tables though.

So it seems like we have a candidate explanation.  I'm a bit surprised
that StandbyReleaseLocks would get this slow if there are only a dozen
AccessExclusiveLocks in place at any one time, though.  Perhaps that
was a low point and there are often many more?

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Jeff Frost

On Jun 30, 2014, at 1:39 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 
 
 Another item of note is the system catalogs are quite bloated:
 Would that cause the replica to spin on StandbyReleaseLocks?
 
 AFAIK, no.  It's an unsurprising consequence of heavy use of short-lived
 temp tables though.
 

Yah, this has been an issue in the past, so we tend to cluster them regularly 
during off-hours to minimize the issue.

 So it seems like we have a candidate explanation.  I'm a bit surprised
 that StandbyReleaseLocks would get this slow if there are only a dozen
 AccessExclusiveLocks in place at any one time, though.  Perhaps that
 was a low point and there are often many more?
 
   

Entirely possible that it was a low point.  We'll set up some monitoring to 
track the number of AccessExclusiveLocks and see how much variance there is 
throughout the day.




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Jeff Frost

On Jun 30, 2014, at 1:46 PM, Jeff Frost j...@pgexperts.com wrote:

 So it seems like we have a candidate explanation.  I'm a bit surprised
 that StandbyReleaseLocks would get this slow if there are only a dozen
 AccessExclusiveLocks in place at any one time, though.  Perhaps that
 was a low point and there are often many more?
 
  
 
 Entirely possible that it was a low point.  We'll set up some monitoring to 
 track the number of AccessExclusiveLocks and see how much variance there is 
 throughout the day.


Since we turned on the monitoring for that, we had a peak of 13,550 
AccessExclusiveLocks.  So far most of the samples have been in the double 
digit, with that and two other outliers: 6,118 and 12,747.

Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Tom Lane
Jeff Frost j...@pgexperts.com writes:
 So it seems like we have a candidate explanation.  I'm a bit surprised
 that StandbyReleaseLocks would get this slow if there are only a dozen
 AccessExclusiveLocks in place at any one time, though.  Perhaps that
 was a low point and there are often many more?

 Since we turned on the monitoring for that, we had a peak of 13,550
 AccessExclusiveLocks.

Ah ... that's more like a number I can believe something would have
trouble coping with.  Did you see a noticeable slowdown with this?
Now that we've seen that number, of course it's possible there was an
even higher peak occurring when you saw the trouble.

Perhaps there's an O(N^2) behavior in StandbyReleaseLocks, or maybe
it just takes awhile to handle that many locks.

Did you check whether the locks were all on temp tables of the
ON COMMIT DROP persuasion?

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Andres Freund
On 2014-06-30 19:04:20 -0400, Tom Lane wrote:
 Jeff Frost j...@pgexperts.com writes:
  So it seems like we have a candidate explanation.  I'm a bit surprised
  that StandbyReleaseLocks would get this slow if there are only a dozen
  AccessExclusiveLocks in place at any one time, though.  Perhaps that
  was a low point and there are often many more?
 
  Since we turned on the monitoring for that, we had a peak of 13,550
  AccessExclusiveLocks.

Any chance the workload also uses lots of subtransactions?

 Ah ... that's more like a number I can believe something would have
 trouble coping with.  Did you see a noticeable slowdown with this?
 Now that we've seen that number, of course it's possible there was an
 even higher peak occurring when you saw the trouble.
 
 Perhaps there's an O(N^2) behavior in StandbyReleaseLocks, or maybe
 it just takes awhile to handle that many locks.

I don't think there's a O(n^2) in StandbyReleaseLocks() itself, but in
combination with StandbyReleaseLockTree() it looks possibly bad. The
latter will call StandbyReleaseLocks() for every xid/subxid, and each of
the StandbyReleaseLocks() will then trawl the entire RecoveryLockList...

It'd probably be better to implement ReleaseLocksTree() by sorting the
subxid list and bsearch that while iterating RecoveryLockList.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Jeff Frost

On Jun 30, 2014, at 4:04 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Ah ... that's more like a number I can believe something would have
 trouble coping with.  Did you see a noticeable slowdown with this?
 Now that we've seen that number, of course it's possible there was an
 even higher peak occurring when you saw the trouble.
 
 Perhaps there's an O(N^2) behavior in StandbyReleaseLocks, or maybe
 it just takes awhile to handle that many locks.
 
 Did you check whether the locks were all on temp tables of the
 ON COMMIT DROP persuasion?


Unfortunately not, because I went for a poor man's: SELECT count(*) FROM 
pg_locks WHERE mode = 'AccessExclusiveLock' 
run in cron every minute.

That said, I'd bet it was mostly ON COMMIT DROP temp tables.

The unfortunate thing is I wouldn't know how to correlate that spike with the 
corresponding slowdown because the replica is about 5.5hrs lagged at the moment.

Hopefully it will get caught up tonight and we can see if there's a correlation 
tomorrow.

[PERFORM] Postgres Replaying WAL slowly

2014-06-29 Thread Soni M
Hello Everyone ...

We have 6 PG 9.1.12 installation, one master (Ubuntu 10.04), one slony
slave(Ubuntu 10.04), and four streaming replica (2 on Ubuntu 10.04 and 2 on
RHEL 6.5 (Santiago) which lies on different datacenter). All Ubuntu is on
same datacenter. Master send wal archive to slony slave.

This is the infrastructure description :
200Mbit link between data centers, esx 5.5 on hp blade chassis. proliant
gen 7 blades. postgres servers dedicated to esx hosts (no other vms on
those esx hosts). 3par disk backends with 4 and 8 Gbit fiber channel
connections. 10Gbit ethernet virtual connects on the hp chassis. cisco
fabric and network switches.

All postgres installed from Ubuntu/RHEL package.

Everything works fine until on Thursday we have high load on master, and
after that every streaming replica lag further behind the master. Even on
night and weekend where all server load is low. But the slony slave is OK
at all.

We thought it was due to network, so we decide to copy wal files to local
of a streaming server, and replaying wal from local. After PG restart, it
replays wal on a good speed about 3 seconds per wal file, but as the time
goes the speed decreasing. We had 30 seconds per wal file. The worst we get
is 3 minutes to replay 1 wal file.

The rate of wal produced from master is normal like usual. And also on
Thursday we had wal files on pg_xlog on streaming replica server, but no
other wal files.

This is the configuration :
SELECT name, current_setting(name)
  FROM pg_settings
  WHERE source NOT IN ('default', 'override');
  name  |  current_setting
+---
 application_name   | psql
 archive_command| /var/lib/postgresql/scripts/wal_archive
%p %f
 archive_mode   | on
 checkpoint_completion_target   | 0.7
 checkpoint_segments| 30
 client_encoding| UTF8
 DateStyle  | ISO, MDY
 default_text_search_config | pg_catalog.english
 effective_cache_size   | 125GB
 effective_io_concurrency   | 3
 external_pid_file  | /var/run/postgresql/9.1-main.pid
 hot_standby| on
 hot_standby_feedback   | on
 lc_messages| en_US.UTF-8
 lc_monetary| en_US.UTF-8
 lc_numeric | en_US.UTF-8
 lc_time| en_US.UTF-8
 listen_addresses   | *
 log_checkpoints| on
 log_connections| on
 log_destination| csvlog
 log_directory  | pg_log
 log_disconnections | on
 log_filename   | postgresql-%a.log
 log_line_prefix| %t
 log_lock_waits | on
 log_rotation_age   | 1d
 log_rotation_size  | 0
 log_temp_files | 100kB
 log_timezone   | localtime
 log_truncate_on_rotation   | on
 logging_collector  | on
 maintenance_work_mem   | 1GB
 max_connections| 750
 max_locks_per_transaction  | 900
 max_pred_locks_per_transaction | 900
 max_stack_depth| 2MB
 max_wal_senders| 6
 port   | 5432
 shared_buffers | 8GB
 ssl| on
 temp_buffers   | 64MB
 TimeZone   | America/Chicago
 unix_socket_directory  | /var/run/postgresql
 wal_keep_segments  | 50
 wal_level  | hot_standby
 work_mem   | 256MB
(47 rows)

Thanks for any help

-- 
Regards,

Soni Maula Harriz


Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-29 Thread Heikki Linnakangas

On 06/29/2014 11:14 AM, Soni M wrote:

Everything works fine until on Thursday we have high load on master, and
after that every streaming replica lag further behind the master. Even on
night and weekend where all server load is low. But the slony slave is OK
at all.


What does 'top' on the standby say? Is the startup process using 100% of 
(one) CPU replaying records, or is it waiting for I/O? How large is the 
database, does it fit in RAM? Any clues in the system or PostgreSQL logs?


- Heikki



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-29 Thread Soni M
top and sar says 100% cpu usage of one core, no sign of I/O wait. The
database is 1.5TB in size. RAM in master is 145GB, on slave it's differ,
some has about 16GB another has 145GB also.

nothing suspicious on standby's postgres log.

on master's postgres log :
WARNING,01000,pgstat wait timeout,
ERROR,57014,canceling autovacuum task,automatic vacuum of
table consprod._consprod_replication.sl_event
ERROR,57014,canceling statement due to statement timeout,,
PARSE,2014-06-26 00:39:35 CDT,91/0,0,ERROR,25P02,current transaction is
aborted, commands ignored until end of transaction block,,select
1,,,
could not receive data from client: Connection reset by peer,

the log files is big anyway. if you can specify some pattern to look at the
log, that would really help.


On Sun, Jun 29, 2014 at 3:31 PM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 On 06/29/2014 11:14 AM, Soni M wrote:

 Everything works fine until on Thursday we have high load on master, and
 after that every streaming replica lag further behind the master. Even on
 night and weekend where all server load is low. But the slony slave is OK
 at all.


 What does 'top' on the standby say? Is the startup process using 100% of
 (one) CPU replaying records, or is it waiting for I/O? How large is the
 database, does it fit in RAM? Any clues in the system or PostgreSQL logs?

 - Heikki




-- 
Regards,

Soni Maula Harriz