Re: [PERFORM] Sudden connection and load average spikes with postgresql 9.3

2015-07-07 Thread eudald_v
Hello guys!

I finally got rid of it.
It looks that at the end it was all due to transparent_hugepages values.

I disabled them and cpu spikes disappeared. I am sorry cause it's something
I usually disable on postgresql servers, but I forgot to do so on this one
and never thought about it.

Thanks a lot for all your helpful messages!

Eudald



--
View this message in context: 
http://postgresql.nabble.com/Sudden-connection-and-load-average-spikes-with-postgresql-9-3-tp5855895p5856914.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sudden connection and load average spikes with postgresql 9.3

2015-07-02 Thread eudald_v
Dear Josh,
I'm sorry I didn't write before, but we have been very busy with this issue
and, you know, when something goes wrong, the apocalypse comes with it.

I've been working on everything you suggested.

I used your tables and script and I can give you a sample of it on
locked_query_start
 2015-07-02 14:49:45.972129+02 |   15314 |  |   4001 |  
 
| TABLE_Z | tuple| ExclusiveLock | 24018:24  | relation   
| ShareUpdateExclusiveLock |  | | YYY.YYY.YYY.YYY/32
| 2015-07-02 14:49:26.635599+02 | 2015-07-02 14:49:26.635599+02 | 2015-07-02
14:49:26.635601+02 | INSERT INTO TABLE_X (field1, field2, field3,
field4, field5, field6, field7) VALUES (22359509, 92, 5, 88713,
'XXX.XXX.XXX.XXX', 199, 10)  ||  | 2015-07-02
14:11:45.368709+02 | 2015-07-02 14:11:45.368709+02 | active   |
2015-07-02 14:11:45.36871+02  | autovacuum: VACUUM ANALYZE public.TABLE_Z

 2015-07-02 14:49:45.972129+02 |   15857 |  |   4001 |  
 
| TABLE_Z | tuple| ExclusiveLock | 24018:24  | relation   
| ShareUpdateExclusiveLock |  | | YYY.YYY.YYY.YYY/32
| 2015-07-02 14:49:22.79166+02  | 2015-07-02 14:49:22.79166+02  | 2015-07-02
14:49:22.791665+02 | INSERT INTO TABLE_X (field1, field2, field3,
field4, field5, field6, field7) VALUES (14515978, 92, 5, 88713,
'XXX.XXX.XXX.XXX', 199, 10) ||  | 2015-07-02
14:11:45.368709+02 | 2015-07-02 14:11:45.368709+02 | active   |
2015-07-02 14:11:45.36871+02  | autovacuum: VACUUM ANALYZE public.TABLE_Z

2015-07-02 14:49:45.972129+02 |   15314 |  |  14712 |   
| TABLE_Z | tuple| ExclusiveLock | 24018:24  | relation   
| AccessShareLock  |  | |
1YYY.YYY.YYY.YYY/32 | 2015-07-02 14:49:26.635599+02 | 2015-07-02
14:49:26.635599+02 | 2015-07-02 14:49:26.635601+02 | INSERT INTO TABLE_X
(field1, field2, field3, field4, field5, field6, field) VALUES
(22359509, 92, 5, 88713, 'XXX.XXX.XXX.XXX', 199, 10)  ||
185.10.253.72/32 | 2015-07-02 14:48:48.841375+02 | 2015-07-02
14:48:48.841375+02 | active   | 2015-07-02 14:48:48.841384+02 | INSERT
INTO TABLE_Y (email_id, sendout_id, feed_id, isp_id)  VALUES
(46015879, 75471, 419, 0)

All that was recorded during a spike. From this log I have to point
something:
Tables TABLE_X and TABLE_Y have both a TRIGGER that does an INSERT to
TABLE_Z
As you can see, TABLE_Z was being VACUUM ANALYZED. I wonder if TRIGGERS and
VACUUM work well together, just to check another perspective.

We also have carefully looked at our scripts and we have performed some code
optimitzations (like close db connections earlier), but the spikes continue
to happen.

FS is ext4 and I don't know how can I check the transaction log
configuration

This is how IO lookslike before and after any problem happens:
Device:tpskB_read/skB_wrtn/skB_readkB_wrtn
sda 357,00  7468,00  8840,00   7468   8840
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   5,020,002,440,060,00   92,47
Device:tpskB_read/skB_wrtn/skB_readkB_wrtn
sda 471,00  7032,00 13760,00   7032  13760
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   5,140,002,920,030,00   91,92
Device:tpskB_read/skB_wrtn/skB_readkB_wrtn
sda 376,00  7192,00  8048,00   7192   8048
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   4,770,002,570,030,00   92,63
Device:tpskB_read/skB_wrtn/skB_readkB_wrtn
sda 304,00  7280,00  8252,00   7280   8252

And this is how it looks like when the spike happens:
http://pastebin.com/2hAYuDZ5

Hope it can help into determining what's happening.

Thanks for all your efforts and collaboration!
Eudald



--
View this message in context: 
http://postgresql.nabble.com/Sudden-connection-and-load-average-spikes-with-postgresql-9-3-tp5855895p5856298.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sudden connection and load average spikes with postgresql 9.3

2015-06-30 Thread eudald_v
Dear Tom,
Thanks for your fast approach.

First of all, yes, queries seems to take more time to process and they are
like queued up (you can even see inserts with status waiting on top/htop).

I didn't know about that connection tip, and I will absolutely find a moment
to add a pg_pooler to reduce the amount of active/idle connections shown at
the database.

I'm monitoring the pg_stat_activity table and when connections raise there's
no query being processed that could block a table and cause the other
queries to stack waiting. I'm checking pg_lock table too, with results to be
determined yet.

At least for the moment, it has spiked up once again (it's happening with
more frequency). I was able to catch this log from I/O. These are 24 300Gb
SAS disks on Raid 10 with hot spare.

Device:tpskB_read/skB_wrtn/skB_readkB_wrtn
sda  42,00   136,0096,00136 96
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  14,820,00   52,750,610,00   31,82
Device:tpskB_read/skB_wrtn/skB_readkB_wrtn
sda 598,00  1764,00 15188,00   1764  15188
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  17,790,00   82,190,010,000,01
Device:tpskB_read/skB_wrtn/skB_readkB_wrtn
sda 230,0028,00  6172,00 28   6172
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  26,790,00   72,490,120,000,60
Device:tpskB_read/skB_wrtn/skB_readkB_wrtn
sda 200,0096,00  8364,00 96   8364
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  17,880,00   81,950,010,000,15
Device:tpskB_read/skB_wrtn/skB_readkB_wrtn
sda  87,0048,00  4164,00 48   4164
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  26,090,00   66,861,690,005,36
Device:tpskB_read/skB_wrtn/skB_readkB_wrtn
sda 563,00  1932,00  9004,00   1932   9004
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  19,300,00   80,380,100,000,21
Device:tpskB_read/skB_wrtn/skB_readkB_wrtn
sda 159,00   348,00  5292,00348   5292
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  28,150,00   68,030,190,003,63
Device:tpskB_read/skB_wrtn/skB_readkB_wrtn
sda 250,00   196,00  7388,00196   7388
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  11,030,00   38,680,140,00   50,16
Device:tpskB_read/skB_wrtn/skB_readkB_wrtn
sda 294,00  1508,00  2916,00   1508   2916

Hope it helps. If you feel we should reduce shared_buffers, what would be a
nice value?

Thanks!



--
View this message in context: 
http://postgresql.nabble.com/Sudden-connection-and-load-average-spikes-with-postgresql-9-3-tp5855895p5855900.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Sudden connection and load average spikes with postgresql 9.3

2015-06-30 Thread eudald_v
Hello all,
This is my very first message to the Postgresql community, and I really hope
you can help me solve the trouble I'm facing.

I've an 80 core server (multithread) with close to 500GB RAM.

My configuration is:
MaxConn: 1500 (was 850)
Shared buffers: 188Gb
work_mem: 110Mb (was 220Mb)
maintenance_work_mem: 256Mb
effective_cache_size: 340Gb

The database is running under postgresql 9.3.9 on an Ubuntu Server 14.04 LTS
(build 3.13.0-55-generic)

Two days from now, I've been experiencing that, randomly, the connections
rise up till they reach max connections, and the load average of the server
goes arround 300~400, making every command issued on the server take
forever. When this happens, ram is relatively low (70Gb used), cores
activity is lower than usual and sometimes swap happens (I've swappiness
configured to 10%)

I've been trying to find the cause of this server underperformance, even
logging all queries in debug mode, but nothing strange found so far.

I really don't know which can be my next step to try to isolate the problem
and that's why I write to you guys. Have you ever seen this behaviour
before?
Could you kindly help me suggesting any step to follow?

Thanks and best regards!
E.v.



--
View this message in context: 
http://postgresql.nabble.com/Sudden-connection-and-load-average-spikes-with-postgresql-9-3-tp5855895.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance