Re: [PERFORM] Performance issues

2015-03-16 Thread Tomas Vondra
On 16.3.2015 20:43, Jim Nasby wrote:
 On 3/13/15 7:12 PM, Tomas Vondra wrote:
 (4) I suspect many of the relations referenced in the views are not
  actually needed in the query, i.e. the join is performed but
  then it's just discarded because those columns are not used.
  Try to simplify the views as much has possible - remove all the
  tables that are not really necessary to run the query. If two
  queries need different tables, maybe defining two views is
  a better approach.
 
 A better alternative with multi-purpose views is to use an outer
 join instead of an inner join. With an outer join if you ultimately
 don't refer to any of the columns in a particular table Postgres will
 remove the table from the query completely.

Really? Because a quick test suggests otherwise:

db=# create table test_a (id int);
CREATE TABLE
db=# create table test_b (id int);
CREATE TABLE
db=# explain select test_a.* from test_a left join test_b using (id);
  QUERY PLAN
--
 Merge Left Join  (cost=359.57..860.00 rows=32512 width=4)
   Merge Cond: (test_a.id = test_b.id)
   -  Sort  (cost=179.78..186.16 rows=2550 width=4)
 Sort Key: test_a.id
 -  Seq Scan on test_a  (cost=0.00..35.50 rows=2550 width=4)
   -  Sort  (cost=179.78..186.16 rows=2550 width=4)
 Sort Key: test_b.id
 -  Seq Scan on test_b  (cost=0.00..35.50 rows=2550 width=4)
(8 rows)

Also, how would that work with duplicate rows in the referenced table?


-- 
Tomas Vondrahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] EXPLAIN (no ANALYZE) taking an hour for INSERT FROM SELECT

2015-03-16 Thread Gunnlaugur Thor Briem
On Mon, Mar 16, 2015 at 7:24 PM, Jim Nasby jim.na...@bluetreble.com wrote:

 The other thing you should consider is using TRUNCATE instead of an
 un-filtered DELETE. It will both be much faster to perform and won't leave
 any dead rows behind.


Yep, but it does take an ACCESS EXCLUSIVE lock. We want the old table
contents to be readable to other sessions while the new table contents are
being populated (which can take quite a while), hence we don't use TRUNCATE.

Best of both worlds is to just populate a new table, flip over to that when
it's ready, and drop the old one once nobody's referring to it anymore.
That way we don't pay the DELETE scan penalty and don't leave dead rows,
and also don't lock reads out while we repopulate.

Gulli


Re: [PERFORM] Performance issues

2015-03-16 Thread Jim Nasby

On 3/16/15 3:59 PM, Tomas Vondra wrote:

On 16.3.2015 20:43, Jim Nasby wrote:

On 3/13/15 7:12 PM, Tomas Vondra wrote:

(4) I suspect many of the relations referenced in the views are not
  actually needed in the query, i.e. the join is performed but
  then it's just discarded because those columns are not used.
  Try to simplify the views as much has possible - remove all the
  tables that are not really necessary to run the query. If two
  queries need different tables, maybe defining two views is
  a better approach.


A better alternative with multi-purpose views is to use an outer
join instead of an inner join. With an outer join if you ultimately
don't refer to any of the columns in a particular table Postgres will
remove the table from the query completely.


Really? Because a quick test suggests otherwise:

db=# create table test_a (id int);
CREATE TABLE
db=# create table test_b (id int);
CREATE TABLE
db=# explain select test_a.* from test_a left join test_b using (id);
   QUERY PLAN
--
  Merge Left Join  (cost=359.57..860.00 rows=32512 width=4)
Merge Cond: (test_a.id = test_b.id)
-  Sort  (cost=179.78..186.16 rows=2550 width=4)
  Sort Key: test_a.id
  -  Seq Scan on test_a  (cost=0.00..35.50 rows=2550 width=4)
-  Sort  (cost=179.78..186.16 rows=2550 width=4)
  Sort Key: test_b.id
  -  Seq Scan on test_b  (cost=0.00..35.50 rows=2550 width=4)
(8 rows)

Also, how would that work with duplicate rows in the referenced table?


Right, I neglected to mention that the omitted table must also be unique 
on the join key:


decibel@decina.attlocal=# create table a(a_id serial primary key);
CREATE TABLE
decibel@decina.attlocal=# create table b(a_id int);
CREATE TABLE
decibel@decina.attlocal=# explain analyze select a.* from a left join b 
using(a_id);
QUERY PLAN 


---
 Hash Right Join  (cost=67.38..137.94 rows=2550 width=4) (actual 
time=0.035..0.035 rows=0 loops=1)

   Hash Cond: (b.a_id = a.a_id)
   -  Seq Scan on b  (cost=0.00..35.50 rows=2550 width=4) (never executed)
   -  Hash  (cost=35.50..35.50 rows=2550 width=4) (actual 
time=0.002..0.002 rows=0 loops=1)

 Buckets: 4096  Batches: 1  Memory Usage: 32kB
 -  Seq Scan on a  (cost=0.00..35.50 rows=2550 width=4) 
(actual time=0.001..0.001 rows=0 loops=1)

 Planning time: 0.380 ms
 Execution time: 0.086 ms
(8 rows)

decibel@decina.attlocal=# alter table b add primary key(a_id);
ALTER TABLE
decibel@decina.attlocal=# explain analyze select a.* from a left join b 
using(a_id);
  QUERY PLAN 


---
 Seq Scan on a  (cost=0.00..35.50 rows=2550 width=4) (actual 
time=0.001..0.001 rows=0 loops=1)

 Planning time: 0.247 ms
 Execution time: 0.029 ms
(3 rows)

decibel@decina.attlocal=# alter table a drop constraint a_pkey;
ALTER TABLE
decibel@decina.attlocal=# explain analyze select a.* from a left join b 
using(a_id);
  QUERY PLAN 


---
 Seq Scan on a  (cost=0.00..35.50 rows=2550 width=4) (actual 
time=0.001..0.001 rows=0 loops=1)

 Planning time: 0.098 ms
 Execution time: 0.011 ms
(3 rows)
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] MusicBrainz postgres performance issues

2015-03-16 Thread Scott Marlowe
On Mon, Mar 16, 2015 at 6:59 AM, Robert Kaye r...@musicbrainz.org wrote:

 4. Linux 3.2 apparently has some less than desirable swap behaviours. Once
 we started swapping, everything went nuts.

On older machines I used to just turn off swap altogether. Esp if I
wasn't running out of memory but swap was engaging anyway. swappiness
= 0 didn't help, nothing did, I just kept seeing kswapd working it's
butt off doing nothing but hitting the swap partition.

So glad to be off those old kernels.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] MusicBrainz postgres performance issues

2015-03-16 Thread Thomas Kellerer
Robert Kaye schrieb am 16.03.2015 um 13:59:
 However, I am glad to report that our problems are fixed and that our
 server is back to humming along nicely.
 
 And as I said to Josh earlier: Postgres rocks our world. I’m
 immensely pleased that once again the problems were our own stupidity
 and not PG’s fault. In over 10 years of us using PG, it has never
 been PG’s fault. Not once.”
 
 And thus we’re one tiny bit smarter today. Thank you everyone!
 

I think it would be nice if you can amend your blog posting to include the 
solution that you found. 

Otherwise this will simply stick around as yet another unsolved performance 
problem

Thomas



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] MusicBrainz postgres performance issues

2015-03-16 Thread Robert Kaye

 On Mar 16, 2015, at 2:22 PM, Thomas Kellerer spam_ea...@gmx.net wrote:
 
 I think it would be nice if you can amend your blog posting to include the 
 solution that you found. 
 
 Otherwise this will simply stick around as yet another unsolved performance 
 problem


Good thinking:

  http://blog.musicbrainz.org/2015/03/16/postgres-troubles-resolved/

I’ve also updated the original post with the like to the above. Case closed. :)

--

--ruaok

Robert Kaye -- r...@musicbrainz.org --http://musicbrainz.org



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] MusicBrainz postgres performance issues

2015-03-16 Thread Andreas Kretschmer
Robert Kaye r...@musicbrainz.org wrote:

 However, I am glad to report that our problems are fixed and that our server 
 is
 back to humming along nicely. 
 
 What we changed:
 
 1. As it was pointed out here, max_connections of 500 was in fact insanely
 high, especially in light of using PGbouncer. Before we used PGbouncer we
 needed a lot more connections and when we started using PGbouncer, we never
 reduced this number.
 
 2. Our server_lifetime was set far too high (1 hour). Josh Berkus suggested
 lowering that to 5 minutes.
 
 3. We reduced the number of PGbouncer active connections to the DB.
 


Many thanks for the feedback!


Andreas
-- 
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.  (Linus Torvalds)
If I was god, I would recompile penguin with --enable-fly.   (unknown)
Kaufbach, Saxony, Germany, Europe.  N 51.05082°, E 13.56889°


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] MusicBrainz postgres performance issues

2015-03-16 Thread Robert Kaye
On March 16, 2015 at 3:24:34 AM, Roxanne Reid-Bennett (r...@tara-lu.com) wrote:
Robert,

Wow - You've engaged the wizards indeed.

I haven't heard or seen anything that would answer my *second* question if 
faced with this (my first would have been what changed)

Yes, indeed — I feel honored to have so many people chime into this issue.

The problem was that nothing abnormal was happening — just the normal queries 
were running that hadn’t given us any problems for months. We undid everything 
that had been recently changed in an effort to address “what changed”. Nothing 
helped, which is what had us so perplexed.

However, I am glad to report that our problems are fixed and that our server is 
back to humming along nicely. 

What we changed:

1. As it was pointed out here, max_connections of 500 was in fact insanely 
high, especially in light of using PGbouncer. Before we used PGbouncer we 
needed a lot more connections and when we started using PGbouncer, we never 
reduced this number.

2. Our server_lifetime was set far too high (1 hour). Josh Berkus suggested 
lowering that to 5 minutes.

3. We reduced the number of PGbouncer active connections to the DB.

What we learned:

1. We had too many backends

2. The backends were being kept around for too long by PGbouncer.

3. This caused too many idle backends to kick around. Once we exhausted 
physical ram, we started swapping.

4. Linux 3.2 apparently has some less than desirable swap behaviours. Once we 
started swapping, everything went nuts. 

Going forward we’re going to upgrade our kernel the next time we have down time 
for our site and the rest should be sorted now.

I wanted to thank everyone who contributed their thoughts to this thread — 
THANK YOU.

And as I said to Josh earlier: Postgres rocks our world. I’m immensely pleased 
that once again the problems were our own stupidity and not PG’s fault. In over 
10 years of us using PG, it has never been PG’s fault. Not once.”

And thus we’re one tiny bit smarter today. Thank you everyone!



P.S. If anyone would still like to get some more information about this problem 
for their own edification, please let me know. Given that we’ve fixed the 
issue, I don’t want to spam this list by responding to all the questions that 
were posed.


--

--ruaok    

Robert Kaye -- r...@musicbrainz.org --    http://musicbrainz.org

Re: [PERFORM] Performance issues

2015-03-16 Thread Vivekanand Joshi
Hi Team,

This is the EXPLAIN ANALYZE for one of the view : S_V_D_CAMPAIGN_HIERARCHY:

===


Nested Loop  (cost=33666.96..37971.39 rows=1 width=894) (actual
time=443.556..966558.767 rows=45360 loops=1)
   Join Filter: (tp_exec.touchpoint_execution_id =
valid_executions.touchpoint_execution_id)
   Rows Removed by Join Filter: 3577676116
   CTE valid_executions
 -  Hash Join  (cost=13753.53..31711.17 rows=1 width=8) (actual
time=232.571..357.749 rows=52997 loops=1)
   Hash Cond:
((s_f_touchpoint_execution_status_history_1.touchpoint_execution_id =
s_f_touchpoint_execution_status_history.touchpoint_execution_id) AND ((max(s
_f_touchpoint_execution_status_history_1.creation_dt)) =
s_f_touchpoint_execution_status_history.creation_dt))
   -  HashAggregate  (cost=6221.56..6905.66 rows=68410 width=16)
(actual time=139.713..171.340 rows=76454 loops=1)
 -  Seq Scan on s_f_touchpoint_execution_status_history
s_f_touchpoint_execution_status_history_1  (cost=0.00..4766.04 rows=291104
width=16) (actual ti
me=0.006..38.582 rows=291104 loops=1)
   -  Hash  (cost=5493.80..5493.80 rows=135878 width=16) (actual
time=92.737..92.737 rows=136280 loops=1)
 Buckets: 16384  Batches: 1  Memory Usage: 6389kB
 -  Seq Scan on s_f_touchpoint_execution_status_history
(cost=0.00..5493.80 rows=135878 width=16) (actual time=0.012..55.078
rows=136280 loops=1)
   Filter: (touchpoint_execution_status_type_id = ANY
('{3,4}'::integer[]))
   Rows Removed by Filter: 154824
   -  Nested Loop Left Join  (cost=1955.80..6260.19 rows=1 width=894)
(actual time=31.608..3147.015 rows=67508 loops=1)
 -  Nested Loop  (cost=1955.67..6260.04 rows=1 width=776) (actual
time=31.602..2912.625 rows=67508 loops=1)
   -  Nested Loop Left Join  (cost=1955.54..6259.87 rows=1
width=658) (actual time=31.595..2713.696 rows=72427 loops=1)
 -  Nested Loop Left Join  (cost=1955.40..6259.71
rows=1 width=340) (actual time=31.589..2532.926 rows=72427 loops=1)
   -  Nested Loop Left Join  (cost=1955.27..6259.55
rows=1 width=222) (actual time=31.581..2354.662 rows=72427 loops=1)
 -  Nested Loop  (cost=1954.99..6259.24
rows=1 width=197) (actual time=31.572..2090.104 rows=72427 loops=1)
   -  Nested Loop
(cost=1954.71..6258.92 rows=1 width=173) (actual time=31.562..1802.857
rows=72427 loops=1)
 Join Filter:
(camp_exec.campaign_id = wave.campaign_id)
 Rows Removed by Join Filter:
243
 -  Nested Loop
(cost=1954.42..6254.67 rows=13 width=167) (actual time=31.551..1468.718
rows=72670 loops=1)
   -  Hash Join
(cost=1954.13..6249.67 rows=13 width=108) (actual time=31.525..402.039
rows=72670 loops=1)
 Hash Cond:
((tp_exec.touchpoint_id = tp.touchpoint_id) AND (wave_exec.wave_id =
tp.wave_id))
 -  Hash Join
(cost=1576.83..4595.51 rows=72956 width=90) (actual time=26.254..256.328
rows=72956 loops=1)
   Hash Cond:
(tp_exec.wave_execution_id = wave_exec.wave_execution_id)
   -  Seq Scan
on s_d_touchpoint_execution tp_exec  (cost=0.00..1559.56 rows=72956
width=42) (actual time=0.005..76.099 rows=72956 loops=1)
   -  Hash
(cost=1001.37..1001.37 rows=46037 width=56) (actual time=26.178..26.178
rows=46037 loops=1)
 Buckets:
8192  Batches: 1  Memory Usage: 4104kB
 -  Seq
Scan on s_d_wave_execution wave_exec  (cost=0.00..1001.37 rows=46037
width=56) (actual time=0.006..10.388 rows=46037 loops=1)
 -  Hash
(cost=212.72..212.72 rows=10972 width=26) (actual time=5.252..5.252
rows=10972 loops=1)
   Buckets: 2048
Batches: 1  Memory Usage: 645kB
   -  Seq Scan
on s_d_touchpoint tp  (cost=0.00..212.72 rows=10972 width=26) (actual
time=0.012..2.319 rows=10972 loops=1)
   -  Index Scan using
s_d_campaign_execution_idx on s_d_campaign_execution camp_exec
(cost=0.29..0.37 rows=1 width=67) (actual time=0.013..0.013 rows=1
loops=72670)
 Index Cond:
(campaign_execution_id = wave_exec.campaign_execution_id)
 -  

Re: [PERFORM] Bad cost estimate with FALSE filter condition

2015-03-16 Thread Josh Berkus
So ... should I assume my diagnosis is correct?  Haven't heard any other
suggestions.

On 02/27/2015 05:28 PM, Josh Berkus wrote:
 All:
 
 This got posted to pgsql-bugs, but got no attention there[1], so I'm
 sending it to this list.
 
 Test case:
 
 createdb bench
 pgbench -i -s bench
 \c bench
 
 bench=# explain select * from pgbench_accounts where aid = 2;
   QUERY PLAN
 ---
  Index Scan using pgbench_accounts_pkey on pgbench_accounts
 (cost=0.42..8.44 rows=1 width=97)
Index Cond: (aid = 2)
 (2 rows)
 
 bench=# explain select * from pgbench_accounts where aid = 2 and false;
 
 QUERY PLAN
 -
  Result  (cost=0.00..26394.00 rows=1 width=97)
One-Time Filter: false
-  Seq Scan on pgbench_accounts  (cost=0.00..26394.00 rows=1 width=97)
 (3 rows)
 
 This seems like a special case of the aborted plan cost, that is, when
 the planner expects to abort a plan early, it nevertheless returns the
 full cost for the non-aborted version of the query, rather than the
 working cost, which is based on the abort.
 
 For example:
 
 bench=# create index on pgbench_accounts(bid);
 CREATE INDEX
 bench=# explain select * from pgbench_accounts where bid = 2;
 QUERY PLAN
 
 --
  Index Scan using pgbench_accounts_bid_idx on pgbench_accounts
 (cost=0.42..4612.10 rows=102667 width=97)
Index Cond: (bid = 2)
 (2 rows)
 
 bench=# explain select * from pgbench_accounts where bid = 2 limit 1;
QUERY PLAN
 
 
  Limit  (cost=0.00..0.28 rows=1 width=97)
-  Seq Scan on pgbench_accounts  (cost=0.00..28894.00 rows=102667
 width=97)
  Filter: (bid = 2)
 (3 rows)
 
 So in this case, the top-level node returns a lower cost because the
 planner knows that it will find a row with bid=2 fairly quickly in the
 seq scan.  But in the WHERE FALSE example, that scan *is* the top-level
 node, so the planner returns a fictitious cost for the whole query.
 
 Or is there something else at work here?
 
 [1]
 http://www.postgresql.org/message-id/20150225194953.2546.86...@wrigleys.postgresql.org
 


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] MusicBrainz postgres performance issues

2015-03-16 Thread Josh Berkus
On 03/16/2015 05:59 AM, Robert Kaye wrote:
 4. Linux 3.2 apparently has some less than desirable swap behaviours.
 Once we started swapping, everything went nuts. 

Relevant to this:

http://www.databasesoup.com/2014/09/why-you-need-to-avoid-linux-kernel-32.html

Anybody who is on Linux Kernels 3.0 to 3.8 really needs to upgrade soon.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance issues

2015-03-16 Thread Marc Mamin

Hi Team,

This is the EXPLAIN ANALYZE for one of the view : S_V_D_CAMPAIGN_HIERARCHY:


Rows Removed by Join Filter: 3577676116

That's quite a lot.
You're possibly missing a clause in a join, resulting in a cross join.
It is also helpful to put your result here:
http://explain.depesz.com/
regards,

Marc Mamin



===


Nested Loop  (cost=33666.96..37971.39 rows=1 width=894) (actual
time=443.556..966558.767 rows=45360 loops=1)
   Join Filter: (tp_exec.touchpoint_execution_id =
valid_executions.touchpoint_execution_id)
   Rows Removed by Join Filter: 3577676116
   CTE valid_executions
 -  Hash Join  (cost=13753.53..31711.17 rows=1 width=8) (actual
time=232.571..357.749 rows=52997 loops=1)
   Hash Cond:
((s_f_touchpoint_execution_status_history_1.touchpoint_execution_id =
s_f_touchpoint_execution_status_history.touchpoint_execution_id) AND ((max(s
_f_touchpoint_execution_status_history_1.creation_dt)) =
s_f_touchpoint_execution_status_history.creation_dt))
   -  HashAggregate  (cost=6221.56..6905.66 rows=68410 width=16)
(actual time=139.713..171.340 rows=76454 loops=1)
 -  Seq Scan on s_f_touchpoint_execution_status_history
s_f_touchpoint_execution_status_history_1  (cost=0.00..4766.04 rows=291104
width=16) (actual ti
me=0.006..38.582 rows=291104 loops=1)
   -  Hash  (cost=5493.80..5493.80 rows=135878 width=16) (actual
time=92.737..92.737 rows=136280 loops=1)
 Buckets: 16384  Batches: 1  Memory Usage: 6389kB
 -  Seq Scan on s_f_touchpoint_execution_status_history
(cost=0.00..5493.80 rows=135878 width=16) (actual time=0.012..55.078
rows=136280 loops=1)
   Filter: (touchpoint_execution_status_type_id = ANY
('{3,4}'::integer[]))
   Rows Removed by Filter: 154824
   -  Nested Loop Left Join  (cost=1955.80..6260.19 rows=1 width=894)
(actual time=31.608..3147.015 rows=67508 loops=1)
 -  Nested Loop  (cost=1955.67..6260.04 rows=1 width=776) (actual
time=31.602..2912.625 rows=67508 loops=1)
   -  Nested Loop Left Join  (cost=1955.54..6259.87 rows=1
width=658) (actual time=31.595..2713.696 rows=72427 loops=1)
 -  Nested Loop Left Join  (cost=1955.40..6259.71
rows=1 width=340) (actual time=31.589..2532.926 rows=72427 loops=1)
   -  Nested Loop Left Join  (cost=1955.27..6259.55
rows=1 width=222) (actual time=31.581..2354.662 rows=72427 loops=1)
 -  Nested Loop  (cost=1954.99..6259.24
rows=1 width=197) (actual time=31.572..2090.104 rows=72427 loops=1)
   -  Nested Loop
(cost=1954.71..6258.92 rows=1 width=173) (actual time=31.562..1802.857
rows=72427 loops=1)
 Join Filter:
(camp_exec.campaign_id = wave.campaign_id)
 Rows Removed by Join Filter:
243
 -  Nested Loop
(cost=1954.42..6254.67 rows=13 width=167) (actual time=31.551..1468.718
rows=72670 loops=1)
   -  Hash Join
(cost=1954.13..6249.67 rows=13 width=108) (actual time=31.525..402.039
rows=72670 loops=1)
 Hash Cond:
((tp_exec.touchpoint_id = tp.touchpoint_id) AND (wave_exec.wave_id =
tp.wave_id))
 -  Hash Join
(cost=1576.83..4595.51 rows=72956 width=90) (actual time=26.254..256.328
rows=72956 loops=1)
   Hash Cond:
(tp_exec.wave_execution_id = wave_exec.wave_execution_id)
   -  Seq Scan
on s_d_touchpoint_execution tp_exec  (cost=0.00..1559.56 rows=72956
width=42) (actual time=0.005..76.099 rows=72956 loops=1)
   -  Hash
(cost=1001.37..1001.37 rows=46037 width=56) (actual time=26.178..26.178
rows=46037 loops=1)
 Buckets:
8192  Batches: 1  Memory Usage: 4104kB
 -  Seq
Scan on s_d_wave_execution wave_exec  (cost=0.00..1001.37 rows=46037
width=56) (actual time=0.006..10.388 rows=46037 loops=1)
 -  Hash
(cost=212.72..212.72 rows=10972 width=26) (actual time=5.252..5.252
rows=10972 loops=1)
   Buckets: 2048
Batches: 1  Memory Usage: 645kB
   -  Seq Scan
on s_d_touchpoint tp  (cost=0.00..212.72 rows=10972 width=26) (actual
time=0.012..2.319 rows=10972 loops=1)
   -  Index Scan using
s_d_campaign_execution_idx on 

Re: [PERFORM] Performance issues

2015-03-16 Thread Vivekanand Joshi
Hey guys, thanks a lot.

This is really helping. I am learning a lot. BTW, I changed CTE into
subquery and it improved the performance by miles. I am getting the result
in less than 3 seconds, though I am using a 24 GB ram server. It is still a
great turnaround time as compared to  previous execution time.

Now I will look into the bigger query. I read explain analyze and that
helped a lot. I will be coming up with more questions tomorrow as bigger
query still has got some problems.
On 16 Mar 2015 23:55, Tomas Vondra tomas.von...@2ndquadrant.com wrote:

 On 16.3.2015 18:49, Marc Mamin wrote:
 
  Hi Team,
 
  This is the EXPLAIN ANALYZE for one of the view :
 S_V_D_CAMPAIGN_HIERARCHY:

 FWIW, this is a somewhat more readable version of the plan:

 http://explain.depesz.com/s/nbB

 In the future, please do two things:

 (1) Attach the plan as a text file, because the mail clients tend to
 screw things up (wrapping long lines). Unless the plan is trivial,
 of course - but pgsql-performance usually deals with complex stuff.

 (2) Put the plan on explain.depesz.com helps too, because it's
 considerably more readable (but always do 1, because resorces
 placed somewhere else tends to disappear, and the posts then make
 very little sense, which is bad when searching in the archives)

 (3) Same for stuff pasted somewhere else - always attach it to the
 message. For example I'd like to give you more accurate advice, but
 I can't as http://pgsql.privatepaste.com/41207bea45 is unavailable.

 
 
Rows Removed by Join Filter: 3577676116
 
That's quite a lot.
You're possibly missing a clause in a join, resulting in a cross
 join.
It is also helpful to put your result here:
http://explain.depesz.com/
regards,

 IMHO this is merely a consequence of using the CTE, which produces 52997
 rows and is scanned 67508x as the inner relation of a nested loop. That
 gives you 3577721476 tuples in total, and only 45360 are kept (hence
 3577676116 are removed).

 This is a prime example of why CTEs are not just aliases for subqueries,
 but may actually cause serious trouble.

 There are other issues (e.g. the row count estimate of the CTE is
 seriously off, most likely because of the HashAggregate in the outer
 branch), but that's a secondary issue IMHO.

 Vivekanand, try this (in the order of intrusiveness):

 (1) Get rid of the CTE, and just replace it with subselect in the FROM
 part of the query, so instead of this:

 WITH valid_executions AS (...)
 SELECT ... FROM ... JOIN valid_executions ON (...)

 you'll have something like this:

 SELECT ... FROM ... JOIN (...) AS valid_executions ON (...)

 This way the subselect will optimized properly.


 (2) Replace the CTE with a materialized view, or a temporary table.
 This has both advantages and disadvantages - the main advantage is
 that you can create indexes, collect statistics. Disadvantage is
 you have to refresh the MV, fill temporary table etc.

 I expect (1) to improve the performance significantly, and (2) might
 improve it even further by fixing the misestimates.


 regards

 --
 Tomas Vondrahttp://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance



Re: [PERFORM] MusicBrainz postgres performance issues

2015-03-16 Thread Joao Junior
Robert many thanks for feedback!!

Could you post your new pgbouncer config file??

How many postgresql process do you have now at OS with this new conf??

How many clients from app server hit your pgbouncer??


Regards,

Regards,

2015-03-16 11:32 GMT-03:00 Robert Kaye r...@musicbrainz.org:


  On Mar 16, 2015, at 2:22 PM, Thomas Kellerer spam_ea...@gmx.net wrote:
 
  I think it would be nice if you can amend your blog posting to include
 the solution that you found.
 
  Otherwise this will simply stick around as yet another unsolved
 performance problem


 Good thinking:

   http://blog.musicbrainz.org/2015/03/16/postgres-troubles-resolved/

 I’ve also updated the original post with the like to the above. Case
 closed. :)

 --

 --ruaok

 Robert Kaye -- r...@musicbrainz.org --
 http://musicbrainz.org



 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance



Re: [PERFORM] EXPLAIN (no ANALYZE) taking an hour for INSERT FROM SELECT

2015-03-16 Thread Jim Nasby

On 3/11/15 11:15 AM, Tom Lane wrote:

Gunnlaugur Thor Briem gunnlau...@gmail.com writes:

Yes, I think that's it: I've just realized that immediately prior to the
INSERT, in the same transaction, an unfiltered DELETE has been issued; i.e.
the whole table is being rewritten. Then the INSERT is issued ... with a
WHERE clause on non-existence in the (now empty) table.



In that case of course the WHERE clause is unnecessary, as it will always
evaluate as true (and we've locked the whole table for writes). Looks like
it is a lot worse than unnecessary, though, if it triggers this performance
snafu in EXPLAIN INSERT.


Ah-hah.  So what's happening is that the planner is doing an indexscan
over the entire table of now-dead rows, looking vainly for an undeleted
maximal row.  Ouch.

I wonder how hard it would be to make the indexscan give up after hitting
N consecutive dead rows, for some suitable N, maybe ~1000.  From the
planner's viewpoint it'd be easy enough to fall back to using whatever
it had in the histogram after all.  But that's all happening down inside
index_getnext, and I'm hesitant to stick some kind of wart into that
machinery for this purpose.


ISTM what we really want here is a time-based behavior, not number of 
rows. Given that, could we do the index probe in a subtransaction, set 
an alarm for X ms, and simply abort the subtransaction if the alarm fires?

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance issues

2015-03-16 Thread Tomas Vondra
On 16.3.2015 18:49, Marc Mamin wrote:
 
 Hi Team,

 This is the EXPLAIN ANALYZE for one of the view : S_V_D_CAMPAIGN_HIERARCHY:

FWIW, this is a somewhat more readable version of the plan:

http://explain.depesz.com/s/nbB

In the future, please do two things:

(1) Attach the plan as a text file, because the mail clients tend to
screw things up (wrapping long lines). Unless the plan is trivial,
of course - but pgsql-performance usually deals with complex stuff.

(2) Put the plan on explain.depesz.com helps too, because it's
considerably more readable (but always do 1, because resorces
placed somewhere else tends to disappear, and the posts then make
very little sense, which is bad when searching in the archives)

(3) Same for stuff pasted somewhere else - always attach it to the
message. For example I'd like to give you more accurate advice, but
I can't as http://pgsql.privatepaste.com/41207bea45 is unavailable.

 
 
   Rows Removed by Join Filter: 3577676116
 
   That's quite a lot.
   You're possibly missing a clause in a join, resulting in a cross join.
   It is also helpful to put your result here:
   http://explain.depesz.com/
   regards,

IMHO this is merely a consequence of using the CTE, which produces 52997
rows and is scanned 67508x as the inner relation of a nested loop. That
gives you 3577721476 tuples in total, and only 45360 are kept (hence
3577676116 are removed).

This is a prime example of why CTEs are not just aliases for subqueries,
but may actually cause serious trouble.

There are other issues (e.g. the row count estimate of the CTE is
seriously off, most likely because of the HashAggregate in the outer
branch), but that's a secondary issue IMHO.

Vivekanand, try this (in the order of intrusiveness):

(1) Get rid of the CTE, and just replace it with subselect in the FROM
part of the query, so instead of this:

WITH valid_executions AS (...)
SELECT ... FROM ... JOIN valid_executions ON (...)

you'll have something like this:

SELECT ... FROM ... JOIN (...) AS valid_executions ON (...)

This way the subselect will optimized properly.


(2) Replace the CTE with a materialized view, or a temporary table.
This has both advantages and disadvantages - the main advantage is
that you can create indexes, collect statistics. Disadvantage is
you have to refresh the MV, fill temporary table etc.

I expect (1) to improve the performance significantly, and (2) might
improve it even further by fixing the misestimates.


regards

-- 
Tomas Vondrahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bad cost estimate with FALSE filter condition

2015-03-16 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 So ... should I assume my diagnosis is correct?  Haven't heard any other
 suggestions.

I don't see any reason to think this is worth worrying about, or worth
spending planner cycles on to produce a cosmetically nicer cost estimate.
One-time filters always apply at the top plan level so they're unlikely
to change any planner choices.  Moreover, for any case other than the
not-terribly-interesting constant FALSE case, we're better off assuming
that the filter condition will be true (and so there's nothing to adjust).

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bad cost estimate with FALSE filter condition

2015-03-16 Thread Josh Berkus
On 03/16/2015 11:26 AM, Tom Lane wrote:
 Josh Berkus j...@agliodbs.com writes:
 So ... should I assume my diagnosis is correct?  Haven't heard any other
 suggestions.
 
 I don't see any reason to think this is worth worrying about, or worth
 spending planner cycles on to produce a cosmetically nicer cost estimate.

I wouldn't say it's critical, but there's two issues:

1) users are confused when they see the plan, especially if it's chosen
in preference to a lower-cost plan.  It's counter-intuitive for EXPLAIN
to not display the real estimated cost.

2) Tools which attempt to do some kind of useful aggregation or event
handling around estimated plan cost have to write special workarounds
for these cases.

Is there anything *useful* about the existing behavior such that we'd
like to preserve it?  Or is it just a matter of Nobody's Submitted A
Patch Yet?

I ask because I'm thinking about a patch, so if changing this will break
a lot of stuff, that's a good thing to know.

 One-time filters always apply at the top plan level so they're unlikely
 to change any planner choices.  Moreover, for any case other than the
 not-terribly-interesting constant FALSE case, we're better off assuming
 that the filter condition will be true (and so there's nothing to adjust).

Except that we *don't* get back the same estimate for a TRUE filter
condition.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] EXPLAIN (no ANALYZE) taking an hour for INSERT FROM SELECT

2015-03-16 Thread Jim Nasby

On 3/11/15 10:54 AM, Gunnlaugur Thor Briem wrote:

(Even better, just make the new table not temporary, and have it replace
the former table altogether. But that's for later; requires some broader
changes in our application.)


The other thing you should consider is using TRUNCATE instead of an 
un-filtered DELETE. It will both be much faster to perform and won't 
leave any dead rows behind.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Best VPS provider for running performant PostgreSQL database server

2015-03-16 Thread Jim Nasby

On 3/16/15 12:08 AM, Some Developer wrote:

I wasn't sure whether to post this in general, admin or performance but
since it is basically a performance question I went with performance.

I'm about to launch a new a website that is written using the Django web
framework and has PostgreSQL as the database server. Unfortunately I
can't afford to get dedicated hardware at the launch of the website as I
won't be making money off it for a couple of months (maybe longer).

So I was wondering if anyone had any recommendations for decent VPS
providers that have good hardware specs for running a PostgreSQL server?
I'll be using PostgreSQL 9.4.

The database is likely to be quite small (under 1GB) for quite sometime
so should I go for double the size of the database in RAM so I can fit
it all in memory if required? The database will be mainly read only with
only a small number of writes (although as new features are added the
number of database write operations will increase).


That's probably your best bet. If you go that route then IO performance 
basically shouldn't matter. That means that instead of spending money 
for a VPS you could just use a cheap EC2 instance.



I guess SSDs are essential these days but am I right about the amount of
RAM? Is there anything else I should be looking out for? I'll just be
running PostgreSQL on the VPS, the web server and app server will be run
on different VPSs.


SSD is in no way essential. It's all a question of what your needs are, 
and from how you're describing it right now your needs are extremely modest.


One thing you absolutely should do however is have at least 1 hot 
standby. That's an absolute must with services like EC2 where a node can 
just vanish, and it's still a good idea with a VPS.



In the past I've used Linode, Digital Ocean, Vultr and RamNode. I've
become disheartened by Digital Ocean so don't want to use them for this
project.


You should take a look at 
https://github.com/manageacloud/cloud-benchmark-postgres and 
https://www.youtube.com/watch?v=JtORBqQdKHY

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] MusicBrainz postgres performance issues

2015-03-16 Thread Jim Nasby

On 3/15/15 7:17 PM, mich...@sqlexec.com wrote:
Please avoid top-posting.


I agree with your counter argument about how high max_connections can
cause problems, but max_connections may not part of the problem here.
There's a bunch of depends stuff in there based on workload details, #
cpus, RAM, etc.


Sure, but the big, huge danger with a very large max_connections is that 
you now have a large grenade with the pin pulled out. If *anything* 
happens to disturb the server and push the active connection count past 
the number of actual cores the box is going to fall over and not recover.


In contrast, if max_connections is = the number of cores this is far 
less likely to happen. Each connection will get a CPU to run on, and as 
long as they're not all clamoring for the same locks the server will be 
making forward progress. Clients may have to wait in the pool for a free 
connection for some time, but once they get one their work will get done.



I'm still waiting to find out how many CPUs on this DB server.  Did i
miss it somewhere in the email thread below?


http://blog.musicbrainz.org/2015/03/15/postgres-troubles/ might show it 
somewhere...

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance issues

2015-03-16 Thread Jim Nasby

On 3/13/15 7:12 PM, Tomas Vondra wrote:

(4) I suspect many of the relations referenced in the views are not
 actually needed in the query, i.e. the join is performed but
 then it's just discarded because those columns are not used.
 Try to simplify the views as much has possible - remove all the
 tables that are not really necessary to run the query. If two
 queries need different tables, maybe defining two views is
 a better approach.


A better alternative with multi-purpose views is to use an outer join 
instead of an inner join. With an outer join if you ultimately don't 
refer to any of the columns in a particular table Postgres will remove 
the table from the query completely.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance