date:20140630

Pujol Mathieu mathieu.pu...@realfusio.com writes:
 Le 29/06/2014 22:30, Tom Lane a écrit :
 I don't actually understand what's being requested here that the
 NotConsistent case doesn't already cover.

 The NotConsistent case is correctly covered, the sub nodes are not 
 tested because I know that no child could pass the consistent_test.
 The MaybeConsistent case is also correctly covered, all sub nodes are 
 tested because I don't know which sub nodes will pass the consistent_test.
 My problem is with the FullyConsistent, because when I test a node I can 
 know that all it's childs nodes and leaves will pass the test, so I want 
 to notify GIST framework that it can't skip consistent test on those 
 nodes. Like we can notify it when testing a leaf that it could skip 
 consistent test on the row. Maybe I miss something on the API to do 
 that. On my tests, the recheck_flag works only for leaves.

Hm ... that doesn't seem like a case that'd come up often enough to be
worth complicating the APIs for, unless maybe you are expecting a lot
of exact-duplicate index entries.  If you are, you might find that GIN
is a better fit for your problem than GIST --- it's designed to be
efficient for lots-of-duplicates.

Another view of this is that if you can make exact satisfaction checks
at upper-page entries, you're probably storing too much information in
the index entries (and thereby bloating the index).  The typical tradeoff
in GIST indexes is something like storing bounding boxes for geometric
objects --- which is necessarily lossy, but it results in small indexes
that are fast to search.  It's particularly important for upper-page
entries to be small, so that fanout is high and you have a better chance
of keeping all the upper pages in cache.

If you've got a compelling example where this actually makes sense,
I'd be curious to hear the details.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Soni M

Here's what 'perf top' said on streaming replica :

Samples: 26K of event 'cpu-clock', Event count (approx.): 19781
 95.97%  postgres [.] 0x002210f3
  0.41%  perf [.] 0x0005f225
  0.39%  libc-2.12.so [.] __strstr_sse2
  0.22%  libc-2.12.so [.] memchr
  0.22%  [kernel] [k] kallsyms_expand_symbol
  0.18%  perf [.] symbols__insert
  0.18%  [kernel] [k] format_decode
  0.15%  libc-2.12.so [.] __GI___strcmp_ssse3
  0.13%  [kernel] [k] string
  0.12%  [kernel] [k] number
  0.12%  [kernel] [k] vsnprintf
  0.12%  libc-2.12.so [.] _IO_vfscanf
  0.11%  perf [.] dso__find_symbol
  0.11%  [kernel] [k] _spin_unlock_irqrestore
  0.10%  perf [.] hex2u64
  0.10%  postgres [.]
hash_search_with_hash_value
  0.09%  perf [.] rb_next
  0.08%  libc-2.12.so [.] memcpy
  0.07%  libc-2.12.so [.] __strchr_sse2
  0.07%  [kernel] [k] clear_page
  0.06%  [kernel] [k] strnlen
  0.05%  perf [.] perf_evsel__parse_sample
  0.05%  perf [.] rb_insert_color
  0.05%  [kernel] [k] pointer



On Mon, Jun 30, 2014 at 2:05 PM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 On 06/29/2014 03:43 PM, Soni M wrote:

 top and sar says 100% cpu usage of one core, no sign of I/O wait.


 Hmm, I wonder what it's doing then... If you have perf installed on the
 system, you can do perf top to get a quick overlook of where the CPU time
 is spent.

 - Heikki




-- 
Regards,

Soni Maula Harriz

Re: [PERFORM] Volatility - docs vs behaviour?

Craig Ringer cr...@2ndquadrant.com writes:
The docs say:

For best optimization results, you should label your functions with the
strictest volatility category that is valid for them.

Yeah ...

... but I recall discussion here suggesting that in fact IMMUTABLE
functions may not be inlined where you'd expect, e.g.
http://www.postgresql.org/message-id/CAFj8pRBF3Qr7WtQwO1H_WN=hhfgk0semwhde+odz3iyv-tr...@mail.gmail.com

The reason that case behaved surprisingly was exactly that the user had
violated the above bit of documentation, ie, he'd marked the function
*incorrectly* as being immutable when in fact its contained functions
were only stable.

I know STRICT can prevent inlining (unfortunately, though necessarily),
but it seems inexplicable that IMMUTABLE should.

I don't see why you find that inexplicable. If the planner were to
inline this function, it would then fail to reduce a call with constant
argument to a constant, which is presumably what the user desires from
marking it immutable (questions of correctness in the face of timezone
changes notwithstanding). Just as we keep the wrapper on when it's
necessary to hide possible non-strictness of the body of a function,
we must do so when inlining would raise the visible volatility of an
expression.

It's true that the above-quoted bit of advice presumes that you correctly
identify the strictest volatility category that is valid for a given
function. If you're too lazy or uninformed to do that, it might be
better to leave the settings at defaults (volatile/nonstrict) and hope
the planner can figure out that it's safe to inline anyway.

regards, tom lane

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Heikki Linnakangas


On 06/30/2014 05:46 PM, Soni M wrote:

Here's what 'perf top' said on streaming replica :

Samples: 26K of event 'cpu-clock', Event count (approx.): 19781
  95.97%  postgres [.] 0x002210f3


Ok, so it's stuck doing something.. Can you get build with debug symbols 
installed, so that we could see the function name?

- Heikki



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Postgres Replaying WAL slowly


On Jun 30, 2014, at 9:14 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote:

 On 06/30/2014 05:46 PM, Soni M wrote:
 Here's what 'perf top' said on streaming replica :
 
 Samples: 26K of event 'cpu-clock', Event count (approx.): 19781
  95.97%  postgres [.] 0x002210f3
 
 Ok, so it's stuck doing something.. Can you get build with debug symbols 
 installed, so that we could see the function name?
 - Heikki
 

Looks like StandbyReleaseLocks:

Samples: 10K of event 'cpu-clock', Event count (approx.): 8507
 89.21%  postgres  [.] StandbyReleaseLocks
  0.89%  libc-2.12.so  [.] __strstr_sse2
  0.83%  perf  [.] 0x0005f1e5
  0.74%  [kernel]  [k] kallsyms_expand_symbol
  0.52%  libc-2.12.so  [.] memchr
  0.47%  perf  [.] symbols__insert
  0.47%  [kernel]  [k] format_decode

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Postgres Replaying WAL slowly

On 2014-06-30 19:14:24 +0300, Heikki Linnakangas wrote:
 On 06/30/2014 05:46 PM, Soni M wrote:
 Here's what 'perf top' said on streaming replica :
 
 Samples: 26K of event 'cpu-clock', Event count (approx.): 19781
   95.97%  postgres [.] 0x002210f3
 
 Ok, so it's stuck doing something.. Can you get build with debug symbols
 installed, so that we could see the function name?

My guess it's a spinlock, probably xlogctl-info_lck via
RecoveryInProgress(). Unfortunately inline assembler doesn't always seem
to show up correctly in profiles...

What worked for me was to build with -fno-omit-frame-pointer - that
normally shows the callers, even if it can't generate a proper symbol
name.

Soni: Do you use Hot Standby? Are there connections active while you
have that problem? Any other processes with high cpu load?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Soni M

On Tue, Jul 1, 2014 at 12:14 AM, Andres Freund and...@2ndquadrant.com
wrote:


 My guess it's a spinlock, probably xlogctl-info_lck via
 RecoveryInProgress(). Unfortunately inline assembler doesn't always seem
 to show up correctly in profiles...

 What worked for me was to build with -fno-omit-frame-pointer - that
 normally shows the callers, even if it can't generate a proper symbol
 name.

 Soni: Do you use Hot Standby? Are there connections active while you
 have that problem? Any other processes with high cpu load?

 Greetings,

 Andres Freund

 --
  Andres Freund http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services


It is
 96.62%  postgres [.] StandbyReleaseLocks
 as Jeff said. It runs quite long time, more than 5 minutes i think

i also use hot standby. we have 4 streaming replica, some of them has
active connection some has not. this issue has last more than 4 days. On
one of the standby, above postgres process is the only process that consume
high cpu load.

-- 
Regards,

Soni Maula Harriz

Re: [PERFORM] Guidelines on best indexing strategy for varying searches on 20+ columns

2014-06-30 Thread Jeff Janes

On Wed, Jun 25, 2014 at 1:48 AM, Niels Kristian Schjødt
nielskrist...@autouncle.com wrote:
 Hi,
 I’m running a search engine for cars. It’s backed by a postgresql 9.3 
 installation.

 Now I’m unsure about the best approach/strategy on doing index optimization 
 for the fronted search.

 The problem:

 The table containing the cars holds a around 1,5 million rows. People that 
 searches for cars needs different criteria to search by. Some search by 
 brand/model, some by year, some by mileage, some by price and some by special 
 equipment etc. etc. - and often they combine a whole bunch of criteria 
 together. Of cause some, like brand/mode and price, are used more frequently 
 than others. In total we offer: 9 category criteria like brand/model or body 
 type, plus 5 numeric criteria like price or mileage, plus 12 boolean criteria 
 like equipment. Lastly people can order the results by different columns 
 (year, price, mileage and a score we create about the cars). By default we 
 order by our own generated score.

 What I’ve done so far:

 I have analyzed the usage of the criteria “lightly”, and created a few 
 indexes (10). Among those, are e.g. indexes on price, mileage and a combined 
 index on brand/model. Since we are only interested in showing results for 
 cars which is actually for sale, the indexes are made as partial indexes on a 
 sales state column.

I'd probably partition the data on whether it is for sale, and then
search only the for-sale partition.


 Questions:

 1. How would you go about analyzing and determining what columns should be 
 indexed, and how?

I'd start out with intuition about which columns are likely to be used
most often, and in a selective way.  And followup by logging slow
queries so they can be dissected at leisure.

 2. What is the best strategy when optimizing indexes for searches happening 
 on 20 + columns, where the use and the combinations varies a lot? (To just 
 index everything, to index some of the columns, to do combined indexes, to 
 only do single column indexes etc. etc.)

There is no magic index.  Based on your description, you are going to
be seq scanning your table a lot.  Focus on making it as small as
possible, but vertical partitioning it so that the not-for-sale
entries are hived off to an historical table, and horizontally
partitioning it so that large columns rarely used in the where clause
are in a separate table (Ideally you would tell postgresql to
aggressively toast those columns, but there is no knob with which to
do that)


 3. I expect that it does not make sense to index all columns?

You mean individually, or jointly?  Either way, probably not.

 4. I expect it does not make sense to index boolean columns?

In some cases it can, for example if the data distribution is very
lopsided and the value with the smaller side is frequently specified.

 5. Is it better to do a combined index on 5 frequently used columns rather 
 than having individual indexes on each of them?

How often are the columns specified together?  If they are completely
independent it probably makes little sense to index them together.

 6. Would it be a goof idea to have all indexes sorted by my default sorting?

You don't get to choose.  An btree index is sorted by the columns
specified in the index, according to the operators specified (or
defaulted).  Unless you mean that you want to add the default sort
column to be the lead column in each index, that actually might make
sense.

 7. Do you have so experiences with other approaches that could greatly 
 improve performance (e.g. forcing indexes to stay in memory etc.)?

If your queries are as unstructured as you imply, I'd forget about
indexes for the most part, as you will have a hard time findings ones
that work.  Concentrate on making seq scans as fast as possible.  If
most of your queries end in something like ORDER by price limit 10
then concentrate on index scans over price.  You will probably want to
include heuristics in your UI such that if people configure queries to
download half your database, you disallow that.  You will probably
find that 90% of the workload comes from people who are just playing
around with your website and don't actually intend to do business with
you.

Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Postgres Replaying WAL slowly

On Jun 30, 2014, at 10:29 AM, Soni M diptat...@gmail.com wrote:

 
 
 
 On Tue, Jul 1, 2014 at 12:14 AM, Andres Freund and...@2ndquadrant.com wrote:
 
 My guess it's a spinlock, probably xlogctl-info_lck via
 RecoveryInProgress(). Unfortunately inline assembler doesn't always seem
 to show up correctly in profiles...
 
 What worked for me was to build with -fno-omit-frame-pointer - that
 normally shows the callers, even if it can't generate a proper symbol
 name.
 
 Soni: Do you use Hot Standby? Are there connections active while you
 have that problem? Any other processes with high cpu load?
 
 Greetings,
 
 Andres Freund
 
 --
  Andres Freund http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services
 
 It is 
  96.62%  postgres [.] StandbyReleaseLocks
  as Jeff said. It runs quite long time, more than 5 minutes i think
 
 i also use hot standby. we have 4 streaming replica, some of them has active 
 connection some has not. this issue has last more than 4 days. On one of the 
 standby, above postgres process is the only process that consume high cpu 
 load.

compiled with -fno-omit-frame-pointer doesn't yield much more info:

 76.24%  postgres   [.] StandbyReleaseLocks
  2.64%  libcrypto.so.1.0.1e[.] md5_block_asm_data_order
  2.19%  libcrypto.so.1.0.1e[.] RC4
  2.17%  postgres   [.] RecordIsValid
  1.20%  [kernel]   [k] copy_user_generic_unrolled
  1.18%  [kernel]   [k] _spin_unlock_irqrestore
  0.97%  [vmxnet3]  [k] vmxnet3_poll_rx_only
  0.87%  [kernel]   [k] __do_softirq
  0.77%  [vmxnet3]  [k] vmxnet3_xmit_frame
  0.69%  postgres   [.] hash_search_with_hash_value
  0.68%  [kernel]   [k] fin

However, this server started progressing through the WAL files quite a bit 
better before I finished compiling, so we'll leave it running with this version 
and see if there's more info available the next time it starts replaying slowly.

Re: [PERFORM] Postgres Replaying WAL slowly

On 2014-06-30 11:34:52 -0700, Jeff Frost wrote:
 On Jun 30, 2014, at 10:29 AM, Soni M diptat...@gmail.com wrote:

  It is 
   96.62%  postgres [.] StandbyReleaseLocks
   as Jeff said. It runs quite long time, more than 5 minutes i think
  
  i also use hot standby. we have 4 streaming replica, some of them has 
  active connection some has not. this issue has last more than 4 days. On 
  one of the standby, above postgres process is the only process that consume 
  high cpu load.
 
 compiled with -fno-omit-frame-pointer doesn't yield much more info:

You'd need to do perf record -ga instead of perf record -a to see
additional information.

But:

  76.24%  postgres   [.] StandbyReleaseLocks

already is quite helpful.

What are you doing on that system? Is there anything requiring large
amounts of access exclusive locks on the primary? Possibly large amounts
of temporary relations?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Pavel Stehule

2014-06-30 20:34 GMT+02:00 Jeff Frost j...@pgexperts.com:

 On Jun 30, 2014, at 10:29 AM, Soni M diptat...@gmail.com wrote:




 On Tue, Jul 1, 2014 at 12:14 AM, Andres Freund and...@2ndquadrant.com
 wrote:


 My guess it's a spinlock, probably xlogctl-info_lck via
 RecoveryInProgress(). Unfortunately inline assembler doesn't always seem
 to show up correctly in profiles...


For this kind of issues a systemtap or dtrace can be useful

http://postgres.cz/wiki/Monitorov%C3%A1n%C3%AD_lwlocku_pomoc%C3%AD_systemtapu

you can identify what locking is a problem - please, use a google translate

Regards

Pavel


  What worked for me was to build with -fno-omit-frame-pointer - that
 normally shows the callers, even if it can't generate a proper symbol
 name.

 Soni: Do you use Hot Standby? Are there connections active while you
 have that problem? Any other processes with high cpu load?

 Greetings,

 Andres Freund

 --
  Andres Freund http://www.2ndQuadrant.com/
 http://www.2ndquadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services


 It is
  96.62%  postgres [.] StandbyReleaseLocks
  as Jeff said. It runs quite long time, more than 5 minutes i think

 i also use hot standby. we have 4 streaming replica, some of them has
 active connection some has not. this issue has last more than 4 days. On
 one of the standby, above postgres process is the only process that consume
 high cpu load.


 compiled with -fno-omit-frame-pointer doesn't yield much more info:

  76.24%  postgres   [.] StandbyReleaseLocks
   2.64%  libcrypto.so.1.0.1e[.]
 md5_block_asm_data_order
   2.19%  libcrypto.so.1.0.1e[.] RC4
   2.17%  postgres   [.] RecordIsValid
   1.20%  [kernel]   [k]
 copy_user_generic_unrolled
   1.18%  [kernel]   [k] _spin_unlock_irqrestore
   0.97%  [vmxnet3]  [k] vmxnet3_poll_rx_only
   0.87%  [kernel]   [k] __do_softirq
   0.77%  [vmxnet3]  [k] vmxnet3_xmit_frame
   0.69%  postgres   [.]
 hash_search_with_hash_value
   0.68%  [kernel]   [k] fin

 However, this server started progressing through the WAL files quite a bit
 better before I finished compiling, so we'll leave it running with this
 version and see if there's more info available the next time it starts
 replaying slowly.

Re: [PERFORM] Postgres Replaying WAL slowly

On Jun 30, 2014, at 11:39 AM, Andres Freund and...@2ndquadrant.com wrote:

 On 2014-06-30 11:34:52 -0700, Jeff Frost wrote:
 On Jun 30, 2014, at 10:29 AM, Soni M diptat...@gmail.com wrote:
 
 It is 
 96.62%  postgres [.] StandbyReleaseLocks
 as Jeff said. It runs quite long time, more than 5 minutes i think
 
 i also use hot standby. we have 4 streaming replica, some of them has 
 active connection some has not. this issue has last more than 4 days. On 
 one of the standby, above postgres process is the only process that consume 
 high cpu load.
 
 compiled with -fno-omit-frame-pointer doesn't yield much more info:
 
 You'd need to do perf record -ga instead of perf record -a to see
 additional information.
 

Ah! That's right.

Here's how that looks:

Samples: 473K of event 'cpu-clock', Event count (approx.): 473738
+  68.42%  init  [kernel.kallsyms] [k] 
native_safe_halt
+  26.07%  postgres  postgres  [.] 
StandbyReleaseLocks
+   2.82%   swapper  [kernel.kallsyms] [k] 
native_safe_halt
+   0.19%   ssh  libcrypto.so.1.0.1e   [.] 
md5_block_asm_data_order
+   0.19%  postgres  postgres  [.] 
RecordIsValid
+   0.16%   ssh  libcrypto.so.1.0.1e   [.] RC4
+   0.10%  postgres  postgres  [.] 
hash_search_with_hash_value
+   0.06%  postgres  [kernel.kallsyms] [k] 
_spin_unlock_irqrestore
+   0.05%  init  [vmxnet3] [k] 
vmxnet3_poll_rx_only
+   0.04%  postgres  [kernel.kallsyms] [k] 
copy_user_generic_unrolled
+   0.04%  init  [kernel.kallsyms] [k] 
finish_task_switch
+   0.04%  init  [kernel.kallsyms] [k] 
__do_softirq
+   0.04%   ssh  [kernel.kallsyms] [k] 
_spin_unlock_irqrestore
+   0.04%   ssh  [vmxnet3] [k] 
vmxnet3_xmit_frame
+   0.03%  postgres  postgres  [.] PinBuffer
+   0.03%  init  [vmxnet3] [k] 
vmxnet3_xmit_frame
+   0.03%   ssh  [kernel.kallsyms] [k] 
copy_user_generic_unrolled
+   0.03%  postgres  postgres  [.] 
XLogReadBufferExtended
+   0.03%   ssh  ssh   [.] 
0x0002aa07
+   0.03%  init  [kernel.kallsyms] [k] 
_spin_unlock_irqrestore
+   0.03%   ssh  [vmxnet3] [k] 
vmxnet3_poll_rx_only
+   0.02%   ssh  [kernel.kallsyms] [k] 
__do_softirq
+   0.02%  postgres  libc-2.12.so  [.] 
_wordcopy_bwd_dest_aligned
+   0.02%  postgres  postgres  [.] mdnblocks
+   0.02%   ssh  libcrypto.so.1.0.1e   [.] 
0x000e25a1
+   0.02%   scp  [kernel.kallsyms] [k] 
copy_user_generic_unrolled
+   0.02%   ssh  libc-2.12.so  [.] memcpy
+   0.02%  postgres  libc-2.12.so  [.] memcpy


 But:
 
 76.24%  postgres   [.] StandbyReleaseLocks
 
 already is quite helpful.
 
 What are you doing on that system? Is there anything requiring large
 amounts of access exclusive locks on the primary? Possibly large amounts
 of temporary relations?


The last time we did a 100% logging run, the peak temp table creation was 
something like 120k/hr, but the replicas seemed able to keep up with that just 
fine.

Hopefully Soni can answer whether that has increased significantly since May.



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Postgres Replaying WAL slowly

On Jun 30, 2014, at 12:17 PM, Jeff Frost j...@pgexperts.com wrote:

 
 already is quite helpful.
 
 What are you doing on that system? Is there anything requiring large
 amounts of access exclusive locks on the primary? Possibly large amounts
 of temporary relations?
 
 
 The last time we did a 100% logging run, the peak temp table creation was 
 something like 120k/hr, but the replicas seemed able to keep up with that 
 just fine.
 

Sampling pg_locks on the primary shows ~50 locks with ExclusiveLock mode:

   mode   | count
--+---
 AccessExclusiveLock  |11
 AccessShareLock  |  2089
 ExclusiveLock|46
 RowExclusiveLock |81
 RowShareLock |17
 ShareLock| 4
 ShareUpdateExclusiveLock | 5

Seems to be relatively consistent.  Of course, it's hard to say what it looked 
like back when the issue began.

Re: [PERFORM] Postgres Replaying WAL slowly

Jeff Frost j...@pgexperts.com writes:
 Sampling pg_locks on the primary shows ~50 locks with ExclusiveLock mode:

mode   | count
 --+---
  AccessExclusiveLock  |11
  AccessShareLock  |  2089
  ExclusiveLock|46
  RowExclusiveLock |81
  RowShareLock |17
  ShareLock| 4
  ShareUpdateExclusiveLock | 5

That's not too helpful if you don't pay attention to what the lock is on;
it's likely that all the ExclusiveLocks are on transactions' own XIDs,
which isn't relevant to the standby's behavior.  The AccessExclusiveLocks
are probably interesting though --- you should look to see what those
are on.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Postgres Replaying WAL slowly


On Jun 30, 2014, at 12:32 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Jeff Frost j...@pgexperts.com writes:
 Sampling pg_locks on the primary shows ~50 locks with ExclusiveLock mode:
 
   mode   | count
 --+---
 AccessExclusiveLock  |11
 AccessShareLock  |  2089
 ExclusiveLock|46
 RowExclusiveLock |81
 RowShareLock |17
 ShareLock| 4
 ShareUpdateExclusiveLock | 5
 
 That's not too helpful if you don't pay attention to what the lock is on;
 it's likely that all the ExclusiveLocks are on transactions' own XIDs,
 which isn't relevant to the standby's behavior.  The AccessExclusiveLocks
 are probably interesting though --- you should look to see what those
 are on.

You're right about the ExclusiveLocks.

Here's how the AccessExclusiveLocks look:

 locktype | database |  relation  | page | tuple | virtualxid | transactionid | 
classid |   objid| objsubid | virtualtransaction |  pid  |mode  
   | granted
--+--++--+---++---+-++--++---+-+-
 relation |   111285 | 3245291551 |  |   ||   | 
||  | 233/170813 | 23509 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292820 |  |   ||   | 
||  | 5/22498235 | 23427 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292833 |  |   ||   | 
||  | 173/1723993| 23407 | 
AccessExclusiveLock | t
 relation |   111285 | 3245287874 |  |   ||   | 
||  | 133/3818415| 23348 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292836 |  |   ||   | 
||  | 173/1723993| 23407 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292774 |  |   ||   | 
||  | 5/22498235 | 23427 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292734 |  |   ||   | 
||  | 5/22498235 | 23427 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292827 |  |   ||   | 
||  | 173/1723993| 23407 | 
AccessExclusiveLock | t
 relation |   111285 | 3245288540 |  |   ||   | 
||  | 133/3818415| 23348 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292773 |  |   ||   | 
||  | 5/22498235 | 23427 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292775 |  |   ||   | 
||  | 5/22498235 | 23427 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292743 |  |   ||   | 
||  | 5/22498235 | 23427 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292751 |  |   ||   | 
||  | 5/22498235 | 23427 | 
AccessExclusiveLock | t
 relation |   111285 | 3245288669 |  |   ||   | 
||  | 133/3818415| 23348 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292817 |  |   ||   | 
||  | 5/22498235 | 23427 | 
AccessExclusiveLock | t
 relation |   111285 | 3245288657 |  |   ||   | 
||  | 133/3818415| 23348 | 
AccessExclusiveLock | t
 object   |   111285 ||  |   ||   | 
   2615 | 1246019760 |0 | 233/170813 | 23509 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292746 |  |   ||   | 
||  | 5/22498235 | 23427 | 
AccessExclusiveLock | t
 relation |   111285 | 3245287876 |  |   ||   | 
||  | 133/3818415| 23348 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292739 |  |   ||   | 
||  | 5/22498235 | 23427 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292826 |  |   ||   | 
||  | 5/22498235 | 23427 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292825 |  |   ||   | 
||  | 5/22498235 | 23427 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292832 |  |   |

Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Matheus de Oliveira

On Mon, Jun 30, 2014 at 4:42 PM, Jeff Frost j...@pgexperts.com wrote:

 And if you go fishing in pg_class for any of the oids, you don't find
 anything:


That is probably because you are connected in the wrong database. Once you
connect to the database of interest, you don't even need to query pg_class,
just cast relation attribute to regclass:

SELECT relation::regclass, ...
FROM pg_locks WHERE database = (SELECT oid FROM pg_database WHERE
datname = current_database());

Regards,
-- 
Matheus de Oliveira
Analista de Banco de Dados
Dextra Sistemas - MPS.Br nível F!
www.dextra.com.br/postgres

Re: [PERFORM] Postgres Replaying WAL slowly


On Jun 30, 2014, at 12:54 PM, Matheus de Oliveira matioli.math...@gmail.com 
wrote:

 
 On Mon, Jun 30, 2014 at 4:42 PM, Jeff Frost j...@pgexperts.com wrote:
 And if you go fishing in pg_class for any of the oids, you don't find 
 anything:
 
 That is probably because you are connected in the wrong database. Once you 
 connect to the database of interest, you don't even need to query pg_class, 
 just cast relation attribute to regclass:
 
 SELECT relation::regclass, ...
 FROM pg_locks WHERE database = (SELECT oid FROM pg_database WHERE datname 
 = current_database());
 

Yah, i thought about that too, but verified I am in the correct DB.  Just for 
clarity sake:

SELECT relation::regclass
FROM pg_locks WHERE database = (SELECT oid FROM pg_database WHERE datname = 
current_database()) and mode = 'AccessExclusiveLock';

  relation



 3245508214
 3245508273
 3245508272
 3245508257
 3245508469
 3245508274
 3245508373
 3245508468
 3245508210
 3245508463
 3245508205
 3245508260
 3245508265
 3245508434
(16 rows)

Re: [PERFORM] Postgres Replaying WAL slowly

On 2014-06-30 12:57:56 -0700, Jeff Frost wrote:
 
 On Jun 30, 2014, at 12:54 PM, Matheus de Oliveira matioli.math...@gmail.com 
 wrote:
 
  
  On Mon, Jun 30, 2014 at 4:42 PM, Jeff Frost j...@pgexperts.com wrote:
  And if you go fishing in pg_class for any of the oids, you don't find 
  anything:
  
  That is probably because you are connected in the wrong database. Once you 
  connect to the database of interest, you don't even need to query pg_class, 
  just cast relation attribute to regclass:
  
  SELECT relation::regclass, ...
  FROM pg_locks WHERE database = (SELECT oid FROM pg_database WHERE 
  datname = current_database());
  
 
 Yah, i thought about that too, but verified I am in the correct DB.  Just for 
 clarity sake:

So these are probably relations created in uncommitted
transactions. Possibly ON COMMIT DROP temp tables?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Postgres Replaying WAL slowly


On Jun 30, 2014, at 1:15 PM, Andres Freund and...@2ndquadrant.com wrote:

 On 2014-06-30 12:57:56 -0700, Jeff Frost wrote:
 
 On Jun 30, 2014, at 12:54 PM, Matheus de Oliveira 
 matioli.math...@gmail.com wrote:
 
 
 On Mon, Jun 30, 2014 at 4:42 PM, Jeff Frost j...@pgexperts.com wrote:
 And if you go fishing in pg_class for any of the oids, you don't find 
 anything:
 
 That is probably because you are connected in the wrong database. Once you 
 connect to the database of interest, you don't even need to query pg_class, 
 just cast relation attribute to regclass:
 
SELECT relation::regclass, ...
FROM pg_locks WHERE database = (SELECT oid FROM pg_database WHERE 
 datname = current_database());
 
 
 Yah, i thought about that too, but verified I am in the correct DB.  Just 
 for clarity sake:
 
 So these are probably relations created in uncommitted
 transactions. Possibly ON COMMIT DROP temp tables?


That would make sense.  There are definitely quite a few of those being used.

Another item of note is the system catalogs are quite bloated:

 schemaname |  tablename   | tbloat | wastedmb | idxbloat | wastedidxmb
+--++--+--+-
 pg_catalog | pg_attribute |   3945 |   106.51 | 2770 |  611.28
 pg_catalog | pg_class |   8940 |45.26 | 4420 |   47.89
 pg_catalog | pg_type  |   4921 |18.45 | 5850 |   81.16
 pg_catalog | pg_depend|933 |10.23 |11730 |  274.37
 pg_catalog | pg_index |   3429 | 8.36 | 3920 |   24.24
 pg_catalog | pg_shdepend  |983 | 2.67 | 9360 |   30.56
(6 rows)

Would that cause the replica to spin on StandbyReleaseLocks?

Re: [PERFORM] Postgres Replaying WAL slowly

Jeff Frost j...@pgexperts.com writes:
 On Jun 30, 2014, at 1:15 PM, Andres Freund and...@2ndquadrant.com wrote:
 So these are probably relations created in uncommitted
 transactions. Possibly ON COMMIT DROP temp tables?

 That would make sense.  There are definitely quite a few of those being used.

Uh-huh.  I doubt that the mechanism that handles propagation of
AccessExclusiveLocks to the standby is smart enough to ignore locks
on temp tables :-(

 Another item of note is the system catalogs are quite bloated:
 Would that cause the replica to spin on StandbyReleaseLocks?

AFAIK, no.  It's an unsurprising consequence of heavy use of short-lived
temp tables though.

So it seems like we have a candidate explanation.  I'm a bit surprised
that StandbyReleaseLocks would get this slow if there are only a dozen
AccessExclusiveLocks in place at any one time, though.  Perhaps that
was a low point and there are often many more?

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Postgres Replaying WAL slowly


On Jun 30, 2014, at 1:39 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 
 
 Another item of note is the system catalogs are quite bloated:
 Would that cause the replica to spin on StandbyReleaseLocks?
 
 AFAIK, no.  It's an unsurprising consequence of heavy use of short-lived
 temp tables though.
 

Yah, this has been an issue in the past, so we tend to cluster them regularly 
during off-hours to minimize the issue.

 So it seems like we have a candidate explanation.  I'm a bit surprised
 that StandbyReleaseLocks would get this slow if there are only a dozen
 AccessExclusiveLocks in place at any one time, though.  Perhaps that
 was a low point and there are often many more?
 
   

Entirely possible that it was a low point.  We'll set up some monitoring to 
track the number of AccessExclusiveLocks and see how much variance there is 
throughout the day.




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Postgres Replaying WAL slowly


On Jun 30, 2014, at 1:46 PM, Jeff Frost j...@pgexperts.com wrote:

 So it seems like we have a candidate explanation.  I'm a bit surprised
 that StandbyReleaseLocks would get this slow if there are only a dozen
 AccessExclusiveLocks in place at any one time, though.  Perhaps that
 was a low point and there are often many more?
 
  
 
 Entirely possible that it was a low point.  We'll set up some monitoring to 
 track the number of AccessExclusiveLocks and see how much variance there is 
 throughout the day.


Since we turned on the monitoring for that, we had a peak of 13,550 
AccessExclusiveLocks.  So far most of the samples have been in the double 
digit, with that and two other outliers: 6,118 and 12,747.

Re: [PERFORM] Postgres Replaying WAL slowly

Jeff Frost j...@pgexperts.com writes:
 So it seems like we have a candidate explanation.  I'm a bit surprised
 that StandbyReleaseLocks would get this slow if there are only a dozen
 AccessExclusiveLocks in place at any one time, though.  Perhaps that
 was a low point and there are often many more?

 Since we turned on the monitoring for that, we had a peak of 13,550
 AccessExclusiveLocks.

Ah ... that's more like a number I can believe something would have
trouble coping with.  Did you see a noticeable slowdown with this?
Now that we've seen that number, of course it's possible there was an
even higher peak occurring when you saw the trouble.

Perhaps there's an O(N^2) behavior in StandbyReleaseLocks, or maybe
it just takes awhile to handle that many locks.

Did you check whether the locks were all on temp tables of the
ON COMMIT DROP persuasion?

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Postgres Replaying WAL slowly

On 2014-06-30 19:04:20 -0400, Tom Lane wrote:
 Jeff Frost j...@pgexperts.com writes:
  So it seems like we have a candidate explanation.  I'm a bit surprised
  that StandbyReleaseLocks would get this slow if there are only a dozen
  AccessExclusiveLocks in place at any one time, though.  Perhaps that
  was a low point and there are often many more?
 
  Since we turned on the monitoring for that, we had a peak of 13,550
  AccessExclusiveLocks.

Any chance the workload also uses lots of subtransactions?

 Ah ... that's more like a number I can believe something would have
 trouble coping with.  Did you see a noticeable slowdown with this?
 Now that we've seen that number, of course it's possible there was an
 even higher peak occurring when you saw the trouble.
 
 Perhaps there's an O(N^2) behavior in StandbyReleaseLocks, or maybe
 it just takes awhile to handle that many locks.

I don't think there's a O(n^2) in StandbyReleaseLocks() itself, but in
combination with StandbyReleaseLockTree() it looks possibly bad. The
latter will call StandbyReleaseLocks() for every xid/subxid, and each of
the StandbyReleaseLocks() will then trawl the entire RecoveryLockList...

It'd probably be better to implement ReleaseLocksTree() by sorting the
subxid list and bsearch that while iterating RecoveryLockList.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Postgres Replaying WAL slowly


On Jun 30, 2014, at 4:04 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Ah ... that's more like a number I can believe something would have
 trouble coping with.  Did you see a noticeable slowdown with this?
 Now that we've seen that number, of course it's possible there was an
 even higher peak occurring when you saw the trouble.
 
 Perhaps there's an O(N^2) behavior in StandbyReleaseLocks, or maybe
 it just takes awhile to handle that many locks.
 
 Did you check whether the locks were all on temp tables of the
 ON COMMIT DROP persuasion?


Unfortunately not, because I went for a poor man's: SELECT count(*) FROM 
pg_locks WHERE mode = 'AccessExclusiveLock' 
run in cron every minute.

That said, I'd bet it was mostly ON COMMIT DROP temp tables.

The unfortunate thing is I wouldn't know how to correlate that spike with the 
corresponding slowdown because the replica is about 5.5hrs lagged at the moment.

Hopefully it will get caught up tonight and we can see if there's a correlation 
tomorrow.

Re: [PERFORM] Volatility - docs vs behaviour?

2014-06-30 Thread Craig Ringer

On 06/30/2014 11:49 PM, Tom Lane wrote:
Craig Ringer cr...@2ndquadrant.com writes:
The docs say:

For best optimization results, you should label your functions with the
strictest volatility category that is valid for them.

Yeah ...

Yes, I realise that's the case with this particular incident. It's the
more general case I'm interested in - whether this can be true in
general, not just when the user does something dumb.

It sounds like you're saying that the behaviour observed here is
specific to cases where the user incorrectly identifies the function
volatility. In which case we don't care, that's fine, no problem here.

My concern was only with whether the advice that the highest volatility
category should be used is always true for *correct* immutable functions
too.

I know STRICT can prevent inlining (unfortunately, though necessarily),
but it seems inexplicable that IMMUTABLE should.

If the input is constant, then clearly it should be evaluated and a
constant substituted.

If it _isn't_ a constant input, then why would STRICT inline when
IMMUTABLE doesn't?

I was unaware that the planner made any attempt to catch users' errors
in marking the strictness of functions. I thought it pretty much trusted
the user not to lie about the mutability of functions invoked
indirectly. I'm not really sure where in the inlining code to look to
figure that out.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training Services

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Volatility - docs vs behaviour?