Re: [PERFORM] Sudden connection and load average spikes with postgresql 9.3
Hello guys! I finally got rid of it. It looks that at the end it was all due to transparent_hugepages values. I disabled them and cpu spikes disappeared. I am sorry cause it's something I usually disable on postgresql servers, but I forgot to do so on this one and never thought about it. Thanks a lot for all your helpful messages! Eudald -- View this message in context: http://postgresql.nabble.com/Sudden-connection-and-load-average-spikes-with-postgresql-9-3-tp5855895p5856914.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Sudden connection and load average spikes with postgresql 9.3
Dear Josh, I'm sorry I didn't write before, but we have been very busy with this issue and, you know, when something goes wrong, the apocalypse comes with it. I've been working on everything you suggested. I used your tables and script and I can give you a sample of it on locked_query_start 2015-07-02 14:49:45.972129+02 | 15314 | | 4001 | | TABLE_Z | tuple| ExclusiveLock | 24018:24 | relation | ShareUpdateExclusiveLock | | | YYY.YYY.YYY.YYY/32 | 2015-07-02 14:49:26.635599+02 | 2015-07-02 14:49:26.635599+02 | 2015-07-02 14:49:26.635601+02 | INSERT INTO TABLE_X (field1, field2, field3, field4, field5, field6, field7) VALUES (22359509, 92, 5, 88713, 'XXX.XXX.XXX.XXX', 199, 10) || | 2015-07-02 14:11:45.368709+02 | 2015-07-02 14:11:45.368709+02 | active | 2015-07-02 14:11:45.36871+02 | autovacuum: VACUUM ANALYZE public.TABLE_Z 2015-07-02 14:49:45.972129+02 | 15857 | | 4001 | | TABLE_Z | tuple| ExclusiveLock | 24018:24 | relation | ShareUpdateExclusiveLock | | | YYY.YYY.YYY.YYY/32 | 2015-07-02 14:49:22.79166+02 | 2015-07-02 14:49:22.79166+02 | 2015-07-02 14:49:22.791665+02 | INSERT INTO TABLE_X (field1, field2, field3, field4, field5, field6, field7) VALUES (14515978, 92, 5, 88713, 'XXX.XXX.XXX.XXX', 199, 10) || | 2015-07-02 14:11:45.368709+02 | 2015-07-02 14:11:45.368709+02 | active | 2015-07-02 14:11:45.36871+02 | autovacuum: VACUUM ANALYZE public.TABLE_Z 2015-07-02 14:49:45.972129+02 | 15314 | | 14712 | | TABLE_Z | tuple| ExclusiveLock | 24018:24 | relation | AccessShareLock | | | 1YYY.YYY.YYY.YYY/32 | 2015-07-02 14:49:26.635599+02 | 2015-07-02 14:49:26.635599+02 | 2015-07-02 14:49:26.635601+02 | INSERT INTO TABLE_X (field1, field2, field3, field4, field5, field6, field) VALUES (22359509, 92, 5, 88713, 'XXX.XXX.XXX.XXX', 199, 10) || 185.10.253.72/32 | 2015-07-02 14:48:48.841375+02 | 2015-07-02 14:48:48.841375+02 | active | 2015-07-02 14:48:48.841384+02 | INSERT INTO TABLE_Y (email_id, sendout_id, feed_id, isp_id) VALUES (46015879, 75471, 419, 0) All that was recorded during a spike. From this log I have to point something: Tables TABLE_X and TABLE_Y have both a TRIGGER that does an INSERT to TABLE_Z As you can see, TABLE_Z was being VACUUM ANALYZED. I wonder if TRIGGERS and VACUUM work well together, just to check another perspective. We also have carefully looked at our scripts and we have performed some code optimitzations (like close db connections earlier), but the spikes continue to happen. FS is ext4 and I don't know how can I check the transaction log configuration This is how IO lookslike before and after any problem happens: Device:tpskB_read/skB_wrtn/skB_readkB_wrtn sda 357,00 7468,00 8840,00 7468 8840 avg-cpu: %user %nice %system %iowait %steal %idle 5,020,002,440,060,00 92,47 Device:tpskB_read/skB_wrtn/skB_readkB_wrtn sda 471,00 7032,00 13760,00 7032 13760 avg-cpu: %user %nice %system %iowait %steal %idle 5,140,002,920,030,00 91,92 Device:tpskB_read/skB_wrtn/skB_readkB_wrtn sda 376,00 7192,00 8048,00 7192 8048 avg-cpu: %user %nice %system %iowait %steal %idle 4,770,002,570,030,00 92,63 Device:tpskB_read/skB_wrtn/skB_readkB_wrtn sda 304,00 7280,00 8252,00 7280 8252 And this is how it looks like when the spike happens: http://pastebin.com/2hAYuDZ5 Hope it can help into determining what's happening. Thanks for all your efforts and collaboration! Eudald -- View this message in context: http://postgresql.nabble.com/Sudden-connection-and-load-average-spikes-with-postgresql-9-3-tp5855895p5856298.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Sudden connection and load average spikes with postgresql 9.3
Dear Tom, Thanks for your fast approach. First of all, yes, queries seems to take more time to process and they are like queued up (you can even see inserts with status waiting on top/htop). I didn't know about that connection tip, and I will absolutely find a moment to add a pg_pooler to reduce the amount of active/idle connections shown at the database. I'm monitoring the pg_stat_activity table and when connections raise there's no query being processed that could block a table and cause the other queries to stack waiting. I'm checking pg_lock table too, with results to be determined yet. At least for the moment, it has spiked up once again (it's happening with more frequency). I was able to catch this log from I/O. These are 24 300Gb SAS disks on Raid 10 with hot spare. Device:tpskB_read/skB_wrtn/skB_readkB_wrtn sda 42,00 136,0096,00136 96 avg-cpu: %user %nice %system %iowait %steal %idle 14,820,00 52,750,610,00 31,82 Device:tpskB_read/skB_wrtn/skB_readkB_wrtn sda 598,00 1764,00 15188,00 1764 15188 avg-cpu: %user %nice %system %iowait %steal %idle 17,790,00 82,190,010,000,01 Device:tpskB_read/skB_wrtn/skB_readkB_wrtn sda 230,0028,00 6172,00 28 6172 avg-cpu: %user %nice %system %iowait %steal %idle 26,790,00 72,490,120,000,60 Device:tpskB_read/skB_wrtn/skB_readkB_wrtn sda 200,0096,00 8364,00 96 8364 avg-cpu: %user %nice %system %iowait %steal %idle 17,880,00 81,950,010,000,15 Device:tpskB_read/skB_wrtn/skB_readkB_wrtn sda 87,0048,00 4164,00 48 4164 avg-cpu: %user %nice %system %iowait %steal %idle 26,090,00 66,861,690,005,36 Device:tpskB_read/skB_wrtn/skB_readkB_wrtn sda 563,00 1932,00 9004,00 1932 9004 avg-cpu: %user %nice %system %iowait %steal %idle 19,300,00 80,380,100,000,21 Device:tpskB_read/skB_wrtn/skB_readkB_wrtn sda 159,00 348,00 5292,00348 5292 avg-cpu: %user %nice %system %iowait %steal %idle 28,150,00 68,030,190,003,63 Device:tpskB_read/skB_wrtn/skB_readkB_wrtn sda 250,00 196,00 7388,00196 7388 avg-cpu: %user %nice %system %iowait %steal %idle 11,030,00 38,680,140,00 50,16 Device:tpskB_read/skB_wrtn/skB_readkB_wrtn sda 294,00 1508,00 2916,00 1508 2916 Hope it helps. If you feel we should reduce shared_buffers, what would be a nice value? Thanks! -- View this message in context: http://postgresql.nabble.com/Sudden-connection-and-load-average-spikes-with-postgresql-9-3-tp5855895p5855900.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Sudden connection and load average spikes with postgresql 9.3
Hello all, This is my very first message to the Postgresql community, and I really hope you can help me solve the trouble I'm facing. I've an 80 core server (multithread) with close to 500GB RAM. My configuration is: MaxConn: 1500 (was 850) Shared buffers: 188Gb work_mem: 110Mb (was 220Mb) maintenance_work_mem: 256Mb effective_cache_size: 340Gb The database is running under postgresql 9.3.9 on an Ubuntu Server 14.04 LTS (build 3.13.0-55-generic) Two days from now, I've been experiencing that, randomly, the connections rise up till they reach max connections, and the load average of the server goes arround 300~400, making every command issued on the server take forever. When this happens, ram is relatively low (70Gb used), cores activity is lower than usual and sometimes swap happens (I've swappiness configured to 10%) I've been trying to find the cause of this server underperformance, even logging all queries in debug mode, but nothing strange found so far. I really don't know which can be my next step to try to isolate the problem and that's why I write to you guys. Have you ever seen this behaviour before? Could you kindly help me suggesting any step to follow? Thanks and best regards! E.v. -- View this message in context: http://postgresql.nabble.com/Sudden-connection-and-load-average-spikes-with-postgresql-9-3-tp5855895.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance