Re: [PERFORM] MusicBrainz postgres performance issues
On Mon, Mar 16, 2015 at 6:59 AM, Robert Kaye r...@musicbrainz.org wrote: 4. Linux 3.2 apparently has some less than desirable swap behaviours. Once we started swapping, everything went nuts. On older machines I used to just turn off swap altogether. Esp if I wasn't running out of memory but swap was engaging anyway. swappiness = 0 didn't help, nothing did, I just kept seeing kswapd working it's butt off doing nothing but hitting the swap partition. So glad to be off those old kernels. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MusicBrainz postgres performance issues
Robert Kaye schrieb am 16.03.2015 um 13:59: However, I am glad to report that our problems are fixed and that our server is back to humming along nicely. And as I said to Josh earlier: Postgres rocks our world. I’m immensely pleased that once again the problems were our own stupidity and not PG’s fault. In over 10 years of us using PG, it has never been PG’s fault. Not once.” And thus we’re one tiny bit smarter today. Thank you everyone! I think it would be nice if you can amend your blog posting to include the solution that you found. Otherwise this will simply stick around as yet another unsolved performance problem Thomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MusicBrainz postgres performance issues
On Mar 16, 2015, at 2:22 PM, Thomas Kellerer spam_ea...@gmx.net wrote: I think it would be nice if you can amend your blog posting to include the solution that you found. Otherwise this will simply stick around as yet another unsolved performance problem Good thinking: http://blog.musicbrainz.org/2015/03/16/postgres-troubles-resolved/ I’ve also updated the original post with the like to the above. Case closed. :) -- --ruaok Robert Kaye -- r...@musicbrainz.org --http://musicbrainz.org -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MusicBrainz postgres performance issues
Robert Kaye r...@musicbrainz.org wrote: However, I am glad to report that our problems are fixed and that our server is back to humming along nicely. What we changed: 1. As it was pointed out here, max_connections of 500 was in fact insanely high, especially in light of using PGbouncer. Before we used PGbouncer we needed a lot more connections and when we started using PGbouncer, we never reduced this number. 2. Our server_lifetime was set far too high (1 hour). Josh Berkus suggested lowering that to 5 minutes. 3. We reduced the number of PGbouncer active connections to the DB. Many thanks for the feedback! Andreas -- Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) If I was god, I would recompile penguin with --enable-fly. (unknown) Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889° -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MusicBrainz postgres performance issues
On March 16, 2015 at 3:24:34 AM, Roxanne Reid-Bennett (r...@tara-lu.com) wrote: Robert, Wow - You've engaged the wizards indeed. I haven't heard or seen anything that would answer my *second* question if faced with this (my first would have been what changed) Yes, indeed — I feel honored to have so many people chime into this issue. The problem was that nothing abnormal was happening — just the normal queries were running that hadn’t given us any problems for months. We undid everything that had been recently changed in an effort to address “what changed”. Nothing helped, which is what had us so perplexed. However, I am glad to report that our problems are fixed and that our server is back to humming along nicely. What we changed: 1. As it was pointed out here, max_connections of 500 was in fact insanely high, especially in light of using PGbouncer. Before we used PGbouncer we needed a lot more connections and when we started using PGbouncer, we never reduced this number. 2. Our server_lifetime was set far too high (1 hour). Josh Berkus suggested lowering that to 5 minutes. 3. We reduced the number of PGbouncer active connections to the DB. What we learned: 1. We had too many backends 2. The backends were being kept around for too long by PGbouncer. 3. This caused too many idle backends to kick around. Once we exhausted physical ram, we started swapping. 4. Linux 3.2 apparently has some less than desirable swap behaviours. Once we started swapping, everything went nuts. Going forward we’re going to upgrade our kernel the next time we have down time for our site and the rest should be sorted now. I wanted to thank everyone who contributed their thoughts to this thread — THANK YOU. And as I said to Josh earlier: Postgres rocks our world. I’m immensely pleased that once again the problems were our own stupidity and not PG’s fault. In over 10 years of us using PG, it has never been PG’s fault. Not once.” And thus we’re one tiny bit smarter today. Thank you everyone! P.S. If anyone would still like to get some more information about this problem for their own edification, please let me know. Given that we’ve fixed the issue, I don’t want to spam this list by responding to all the questions that were posed. -- --ruaok Robert Kaye -- r...@musicbrainz.org -- http://musicbrainz.org
Re: [PERFORM] MusicBrainz postgres performance issues
On 03/16/2015 05:59 AM, Robert Kaye wrote: 4. Linux 3.2 apparently has some less than desirable swap behaviours. Once we started swapping, everything went nuts. Relevant to this: http://www.databasesoup.com/2014/09/why-you-need-to-avoid-linux-kernel-32.html Anybody who is on Linux Kernels 3.0 to 3.8 really needs to upgrade soon. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MusicBrainz postgres performance issues
Robert many thanks for feedback!! Could you post your new pgbouncer config file?? How many postgresql process do you have now at OS with this new conf?? How many clients from app server hit your pgbouncer?? Regards, Regards, 2015-03-16 11:32 GMT-03:00 Robert Kaye r...@musicbrainz.org: On Mar 16, 2015, at 2:22 PM, Thomas Kellerer spam_ea...@gmx.net wrote: I think it would be nice if you can amend your blog posting to include the solution that you found. Otherwise this will simply stick around as yet another unsolved performance problem Good thinking: http://blog.musicbrainz.org/2015/03/16/postgres-troubles-resolved/ I’ve also updated the original post with the like to the above. Case closed. :) -- --ruaok Robert Kaye -- r...@musicbrainz.org -- http://musicbrainz.org -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MusicBrainz postgres performance issues
On 3/15/15 7:17 PM, mich...@sqlexec.com wrote: Please avoid top-posting. I agree with your counter argument about how high max_connections can cause problems, but max_connections may not part of the problem here. There's a bunch of depends stuff in there based on workload details, # cpus, RAM, etc. Sure, but the big, huge danger with a very large max_connections is that you now have a large grenade with the pin pulled out. If *anything* happens to disturb the server and push the active connection count past the number of actual cores the box is going to fall over and not recover. In contrast, if max_connections is = the number of cores this is far less likely to happen. Each connection will get a CPU to run on, and as long as they're not all clamoring for the same locks the server will be making forward progress. Clients may have to wait in the pool for a free connection for some time, but once they get one their work will get done. I'm still waiting to find out how many CPUs on this DB server. Did i miss it somewhere in the email thread below? http://blog.musicbrainz.org/2015/03/15/postgres-troubles/ might show it somewhere... -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MusicBrainz postgres performance issues
On Mar 15, 2015, at 13:45, Josh Krupka jkru...@gmail.com wrote: Hmm that's definitely odd that it's swapping since it has plenty of free memory at the moment. Is it still under heavy load right now? Has the output of free consistently looked like that during your trouble times? And it seems better to disable swapiness -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] MusicBrainz postgres performance issues
Hi! We at MusicBrainz have been having trouble with our Postgres install for the past few days. I’ve collected all the relevant information here: http://blog.musicbrainz.org/2015/03/15/postgres-troubles/ http://blog.musicbrainz.org/2015/03/15/postgres-troubles/ If anyone could provide tips, suggestions or other relevant advice for what to poke at next, we would love it. Thanks! -- --ruaok Robert Kaye -- r...@musicbrainz.org mailto:r...@musicbrainz.org --http://musicbrainz.org http://musicbrainz.org/
Re: [PERFORM] MusicBrainz postgres performance issues
On Mar 15, 2015, at 12:13 PM, Josh Krupka jkru...@gmail.com wrote: It sounds like you've hit the postgres basics, what about some of the linux check list items? what does free -m show on your db server? total used free sharedbuffers cached Mem: 48295 31673 16622 0 5 12670 -/+ buffers/cache: 18997 29298 Swap:22852 2382 20470 If the load problem really is being caused by swapping when things really shouldn't be swapping, it could be a matter of adjusting your swappiness - what does cat /proc/sys/vm/swappiness show on your server? 0 We adjusted that too, but no effect. (I’ve updated the blog post with these two comments) There are other linux memory management things that can cause postgres and the server running it to throw fits like THP and zone reclaim. I don't have enough info about your system to say they are the cause either, but check out the many postings here and other places on the detrimental effect that those settings *can* have. That would at least give you another angle to investigate. If there are specific things you’d like to know, I’ve be happy to be a human proxy. :) Thanks! -- --ruaok Robert Kaye -- r...@musicbrainz.org mailto:r...@musicbrainz.org --http://musicbrainz.org http://musicbrainz.org/
Re: [PERFORM] MusicBrainz postgres performance issues
On 2015-03-15 13:08:13 +0100, Robert Kaye wrote: On Mar 15, 2015, at 12:41 PM, Andreas Kretschmer akretsch...@spamfence.net wrote: just a wild guess: raid-controller BBU faulty We don’t have a BBU in this server, but at least we have redundant power supplies. In any case, how would a fault batter possibly cause this? Many controllers disable write-back caching when the battery is dead. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MusicBrainz postgres performance issues
pls check this if it helps: http://ubuntuforums.org/showthread.php?t=2258734 在 2015/3/15 18:54, Robert Kaye 写道: Hi! We at MusicBrainz have been having trouble with our Postgres install for the past few days. I’ve collected all the relevant information here: http://blog.musicbrainz.org/2015/03/15/postgres-troubles/ If anyone could provide tips, suggestions or other relevant advice for what to poke at next, we would love it. Thanks! -- --ruaok Robert Kaye -- r...@musicbrainz.org -- http://musicbrainz.org
Re: [PERFORM] MusicBrainz postgres performance issues
On Sun, Mar 15, 2015 at 8:07 AM, Robert Kaye r...@musicbrainz.org wrote: what does free -m show on your db server? total used free sharedbuffers cached Mem: 48295 31673 16622 0 5 12670 -/+ buffers/cache: 18997 29298 Swap:22852 2382 20470 Hmm that's definitely odd that it's swapping since it has plenty of free memory at the moment. Is it still under heavy load right now? Has the output of free consistently looked like that during your trouble times? If the load problem really is being caused by swapping when things really shouldn't be swapping, it could be a matter of adjusting your swappiness - what does cat /proc/sys/vm/swappiness show on your server? 0 We adjusted that too, but no effect. (I’ve updated the blog post with these two comments) That had been updated a while ago or just now? There are other linux memory management things that can cause postgres and the server running it to throw fits like THP and zone reclaim. I don't have enough info about your system to say they are the cause either, but check out the many postings here and other places on the detrimental effect that those settings *can* have. That would at least give you another angle to investigate. If there are specific things you’d like to know, I’ve be happy to be a human proxy. :) If zone reclaim is enabled (I think linux usually decides whether or not to enable it at boot time depending on the numa architecture) it sometimes avoids using memory on remote numa nodes if it thinks that memory access is too expensive. This can lead to way too much disk access (not sure if it would actually make linux swap or not...) and lots of ram sitting around doing nothing instead of being used for fs cache like it should be. Check to see if zone reclaim is enabled with this command: cat /proc/sys/vm/zone_reclaim_mode. If your server is a numa one, you can install the numactl utility and look at the numa layout with this: numactl --hardware I'm not sure how THP would cause lots of swapping, but it's worth checking in general: cat /sys/kernel/mm/transparent_hugepage/enabled. If it's spending too much time trying to compact memory pages it can cause stalls in your processes. To get the thp metrics do egrep 'trans|thp' /proc/vmstat
Re: [PERFORM] MusicBrainz postgres performance issues
On 15.3.2015 13:07, Robert Kaye wrote: If the load problem really is being caused by swapping when things really shouldn't be swapping, it could be a matter of adjusting your swappiness - what does cat /proc/sys/vm/swappiness show on your server? 0 We adjusted that too, but no effect. (I’ve updated the blog post with these two comments) IMHO setting swappiness to 0 is way too aggressive. Just set it to something like 10 - 20, that works better in my experience. There are other linux memory management things that can cause postgres and the server running it to throw fits like THP and zone reclaim. I don't have enough info about your system to say they are the cause either, but check out the many postings here and other places on the detrimental effect that those settings *can* have. That would at least give you another angle to investigate. If there are specific things you’d like to know, I’ve be happy to be a human proxy. :) I'd start with vm.* configuration, so the output from this: # sysctl -a | grep '^vm.*' and possibly /proc/meminfo. I'm especially interested in the overcommit settings, because per the free output you provided there's ~16GB of free RAM. BTW what amounts of data are we talking about? How large is the database and how large is the active set? I also noticed you use kernel 3.2 - that's not the best kernel version for PostgreSQL - see [1] or [2] for example. [1] https://medium.com/postgresql-talk/benchmarking-postgresql-with-different-linux-kernel-versions-on-ubuntu-lts-e61d57b70dd4 [2] http://www.databasesoup.com/2014/09/why-you-need-to-avoid-linux-kernel-32.html -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MusicBrainz postgres performance issues
On 2015-03-15 13:07:25 +0100, Robert Kaye wrote: On Mar 15, 2015, at 12:13 PM, Josh Krupka jkru...@gmail.com wrote: It sounds like you've hit the postgres basics, what about some of the linux check list items? what does free -m show on your db server? total used free sharedbuffers cached Mem: 48295 31673 16622 0 5 12670 -/+ buffers/cache: 18997 29298 Swap:22852 2382 20470 Could you post /proc/meminfo instead? That gives a fair bit more information. Also: * What hardware is this running on? * Why do you need 500 connections (that are nearly all used) when you have a pgbouncer in front of the database? That's not going to be efficient. * Do you have any data tracking the state connections are in? I.e. whether they're idle or not? The connections graph on you linked doesn't give that information? * You're apparently not graphing CPU usage. How busy are the CPUs? How much time is spent in the kernel (i.e. system)? * Consider installing perf (linux-utils-$something) and doing a systemwide profile. 3.2 isn't the greatest kernel around, efficiency wise. At some point you might want to upgrade to something newer. I've seen remarkable differences around this. You really should upgrade postgres to a newer major version one of these days. Especially 9.2. can give you a remarkable improvement in performance with many connections in a read mostly workload. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MusicBrainz postgres performance issues
On Mar 15, 2015, at 12:41 PM, Andreas Kretschmer akretsch...@spamfence.net wrote: just a wild guess: raid-controller BBU faulty We don’t have a BBU in this server, but at least we have redundant power supplies. In any case, how would a fault batter possibly cause this? -- --ruaok Robert Kaye -- r...@musicbrainz.org --http://musicbrainz.org -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MusicBrainz postgres performance issues
Hi! What shows your pg_stat_bgwriter for one day? On Mar 15, 2015, at 11:54, Robert Kaye r...@musicbrainz.org wrote: Hi! We at MusicBrainz have been having trouble with our Postgres install for the past few days. I’ve collected all the relevant information here: http://blog.musicbrainz.org/2015/03/15/postgres-troubles/ If anyone could provide tips, suggestions or other relevant advice for what to poke at next, we would love it. Thanks! -- --ruaok Robert Kaye -- r...@musicbrainz.org --http://musicbrainz.org
Re: [PERFORM] MusicBrainz postgres performance issues
Robert Kaye r...@musicbrainz.org wrote: Hi! We at MusicBrainz have been having trouble with our Postgres install for the past few days. I’ve collected all the relevant information here: http://blog.musicbrainz.org/2015/03/15/postgres-troubles/ If anyone could provide tips, suggestions or other relevant advice for what to poke at next, we would love it. just a wild guess: raid-controller BBU faulty Andreas -- Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) If I was god, I would recompile penguin with --enable-fly. (unknown) Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889° -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MusicBrainz postgres performance issues
It sounds like you've hit the postgres basics, what about some of the linux check list items? what does free -m show on your db server? If the load problem really is being caused by swapping when things really shouldn't be swapping, it could be a matter of adjusting your swappiness - what does cat /proc/sys/vm/swappiness show on your server? There are other linux memory management things that can cause postgres and the server running it to throw fits like THP and zone reclaim. I don't have enough info about your system to say they are the cause either, but check out the many postings here and other places on the detrimental effect that those settings *can* have. That would at least give you another angle to investigate.
Re: [PERFORM] MusicBrainz postgres performance issues
On Sun, Mar 15, 2015 at 11:09 AM, Scott Marlowe scott.marl...@gmail.com wrote: Clarification: 64MB work mem AND max_connections = 500 is a recipe for disaster. No db can actively process 500 queries at once without going kaboom, ad having 64MB work_mem means it will go kaboom long before it reaches 500 active connections. Lower that and let pgbouncer handle the extra connections for you. Lower max_connections. work_mem 64MB is fine as long as max_connections is something reasonable (reasonable is generally #CPU cores * 2 or so). work_mem is per sort. A single query could easily use 2 or 4x work_mem all by itself. You can see how having hundreds of active connections each using 64MB or more at the same time can kill your server. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MusicBrainz postgres performance issues
On 2015-03-15 11:09:34 -0600, Scott Marlowe wrote: shared_mem of 12G is almost always too large. I'd drop it down to ~1G or so. I think that's a outdated wisdom, i.e. not generally true. I've now seen a significant number of systems where a larger shared_buffers can help quite massively. The primary case where it can, in my experience, go bad are write mostly database where every buffer acquiration has to write out dirty data while holding locks. Especially during relation extension that's bad. A new enough kernel, a sane filesystem (i.e. not ext3) and sane checkpoint configuration takes care of most of the other disadvantages. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MusicBrainz postgres performance issues
On 2015-03-15 20:42:51 +0300, Ilya Kosmodemiansky wrote: On Sun, Mar 15, 2015 at 8:20 PM, Andres Freund and...@2ndquadrant.com wrote: On 2015-03-15 11:09:34 -0600, Scott Marlowe wrote: shared_mem of 12G is almost always too large. I'd drop it down to ~1G or so. I think that's a outdated wisdom, i.e. not generally true. Quite agreed. With note, that proper configured controller with BBU is needed. That imo doesn't really have anything to do with it. The primary benefit of a BBU with writeback caching is accelerating (near-)synchronous writes. Like the WAL. But, besides influencing the default for wal_buffers, a larger shared_buffers doesn't change the amount of synchronous writes. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MusicBrainz postgres performance issues
On Sun, Mar 15, 2015 at 8:46 PM, Andres Freund and...@2ndquadrant.com wrote: That imo doesn't really have anything to do with it. The primary benefit of a BBU with writeback caching is accelerating (near-)synchronous writes. Like the WAL. My point was, that having no proper raid controller (today bbu surely needed for the controller to be a proper one) + heavy writes of any kind, it is absolutely impossible to live with large shared_buffers and without io problems. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Ilya Kosmodemiansky, PostgreSQL-Consulting.com tel. +14084142500 cell. +4915144336040 i...@postgresql-consulting.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MusicBrainz postgres performance issues
On Sun, Mar 15, 2015 at 7:50 AM, Andres Freund and...@2ndquadrant.com wrote: On 2015-03-15 13:07:25 +0100, Robert Kaye wrote: On Mar 15, 2015, at 12:13 PM, Josh Krupka jkru...@gmail.com wrote: It sounds like you've hit the postgres basics, what about some of the linux check list items? what does free -m show on your db server? total used free sharedbuffers cached Mem: 48295 31673 16622 0 5 12670 -/+ buffers/cache: 18997 29298 Swap:22852 2382 20470 Could you post /proc/meminfo instead? That gives a fair bit more information. Also: * What hardware is this running on? * Why do you need 500 connections (that are nearly all used) when you have a pgbouncer in front of the database? That's not going to be efficient. * Do you have any data tracking the state connections are in? I.e. whether they're idle or not? The connections graph on you linked doesn't give that information? * You're apparently not graphing CPU usage. How busy are the CPUs? How much time is spent in the kernel (i.e. system)? htop is a great tool for watching the CPU cores live. Red == kernel btw. * Consider installing perf (linux-utils-$something) and doing a systemwide profile. 3.2 isn't the greatest kernel around, efficiency wise. At some point you might want to upgrade to something newer. I've seen remarkable differences around this. That is an understatement. Here's a nice article on why it's borked: http://www.databasesoup.com/2014/09/why-you-need-to-avoid-linux-kernel-32.html Had a 32 core machine with big RAID BBU and 512GB memory that was dying using 3.2 kernel. went to 3.11 and it went from a load of 20 to 40 to a load of 5. You really should upgrade postgres to a newer major version one of these days. Especially 9.2. can give you a remarkable improvement in performance with many connections in a read mostly workload. Agreed. ubuntu 12.04 with kernel 3.11/3.13 with pg 9.2 has been a great improvement over debian squeeze and pg 8.4 that we were running at work until recently. As for the OP. if you've got swap activity causing issues when there's plenty of free space just TURN IT OFF. swapoff -a I do this on all my big memory servers that don't really need swap, esp when I was using hte 3.2 kernel which seems broken as regards swap on bigger memory machines. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MusicBrainz postgres performance issues
On 03/15/2015 09:43 AM, Scott Marlowe wrote: * Consider installing perf (linux-utils-$something) and doing a systemwide profile. 3.2 isn't the greatest kernel around, efficiency wise. At some point you might want to upgrade to something newer. I've seen remarkable differences around this. Not at some point, now. 3.2 - 3.8 are undeniably broken for PostgreSQL. That is an understatement. Here's a nice article on why it's borked: http://www.databasesoup.com/2014/09/why-you-need-to-avoid-linux-kernel-32.html Had a 32 core machine with big RAID BBU and 512GB memory that was dying using 3.2 kernel. went to 3.11 and it went from a load of 20 to 40 to a load of 5. Yep, I can confirm this behavior. You really should upgrade postgres to a newer major version one of these days. Especially 9.2. can give you a remarkable improvement in performance with many connections in a read mostly workload. Seconded. JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, @cmdpromptinc Now I get it: your service is designed for a customer base that grew up with Facebook, watches Japanese seizure robot anime, and has the attention span of a gnat. I'm not that user., Tyler Riddle -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MusicBrainz postgres performance issues
On Sun, Mar 15, 2015 at 10:43 AM, Scott Marlowe scott.marl...@gmail.com wrote: On Sun, Mar 15, 2015 at 7:50 AM, Andres Freund and...@2ndquadrant.com wrote: On 2015-03-15 13:07:25 +0100, Robert Kaye wrote: On Mar 15, 2015, at 12:13 PM, Josh Krupka jkru...@gmail.com wrote: It sounds like you've hit the postgres basics, what about some of the linux check list items? what does free -m show on your db server? total used free sharedbuffers cached Mem: 48295 31673 16622 0 5 12670 -/+ buffers/cache: 18997 29298 Swap:22852 2382 20470 Could you post /proc/meminfo instead? That gives a fair bit more information. Also: * What hardware is this running on? * Why do you need 500 connections (that are nearly all used) when you have a pgbouncer in front of the database? That's not going to be efficient. * Do you have any data tracking the state connections are in? I.e. whether they're idle or not? The connections graph on you linked doesn't give that information? * You're apparently not graphing CPU usage. How busy are the CPUs? How much time is spent in the kernel (i.e. system)? htop is a great tool for watching the CPU cores live. Red == kernel btw. * Consider installing perf (linux-utils-$something) and doing a systemwide profile. 3.2 isn't the greatest kernel around, efficiency wise. At some point you might want to upgrade to something newer. I've seen remarkable differences around this. That is an understatement. Here's a nice article on why it's borked: http://www.databasesoup.com/2014/09/why-you-need-to-avoid-linux-kernel-32.html Had a 32 core machine with big RAID BBU and 512GB memory that was dying using 3.2 kernel. went to 3.11 and it went from a load of 20 to 40 to a load of 5. You really should upgrade postgres to a newer major version one of these days. Especially 9.2. can give you a remarkable improvement in performance with many connections in a read mostly workload. Agreed. ubuntu 12.04 with kernel 3.11/3.13 with pg 9.2 has been a great improvement over debian squeeze and pg 8.4 that we were running at work until recently. As for the OP. if you've got swap activity causing issues when there's plenty of free space just TURN IT OFF. swapoff -a I do this on all my big memory servers that don't really need swap, esp when I was using hte 3.2 kernel which seems broken as regards swap on bigger memory machines. OK I've now read your blog post. A few pointers I'd make. shared_mem of 12G is almost always too large. I'd drop it down to ~1G or so. 64MB work mem AND max_connections = 500 is a recipe for disaster. No db can actively process 500 queries at once without going kaboom, ad having 64MB work_mem means it will go kaboom long before it reaches 500 active connections. Lower that and let pgbouncer handle the extra connections for you. Get some monitoring installed if you don't already have it so you can track memory usage, cpu usage, disk usage etc. Zabbix or Nagios work well. Without some kind of system monitoring you're missing half the information you need to troubleshoot with. Install iotop, sysstat, and htop. Configure sysstat to collect data so you can use sar to see what the machine's been doing in the past few days etc. Set it to 1 minute intervals in the /etc/cron.d/sysstat file. Do whatever you have to to get kernel 3.11 or greater on that machine (or a new one). You don't have to upgrade pg just yet but the upgrade of the kernel is essential. Good luck. Let us know what you find and if you can get that machine back on its feet. -- To understand recursion, one must first understand recursion. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MusicBrainz postgres performance issues
On Sun, Mar 15, 2015 at 11:46 AM, Andres Freund and...@2ndquadrant.com wrote: On 2015-03-15 20:42:51 +0300, Ilya Kosmodemiansky wrote: On Sun, Mar 15, 2015 at 8:20 PM, Andres Freund and...@2ndquadrant.com wrote: On 2015-03-15 11:09:34 -0600, Scott Marlowe wrote: shared_mem of 12G is almost always too large. I'd drop it down to ~1G or so. I think that's a outdated wisdom, i.e. not generally true. Quite agreed. With note, that proper configured controller with BBU is needed. That imo doesn't really have anything to do with it. The primary benefit of a BBU with writeback caching is accelerating (near-)synchronous writes. Like the WAL. But, besides influencing the default for wal_buffers, a larger shared_buffers doesn't change the amount of synchronous writes. Here's the problem with a large shared_buffers on a machine that's getting pushed into swap. It starts to swap BUFFERs. Once buffers start getting swapped you're not just losing performance, that huge shared_buffers is now working against you because what you THINK are buffers in RAM to make things faster are in fact blocks on a hard drive being swapped in and out during reads. It's the exact opposite of fast. :) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MusicBrainz postgres performance issues
On 03/15/2015 05:08 AM, Robert Kaye wrote: On Mar 15, 2015, at 12:41 PM, Andreas Kretschmer akretsch...@spamfence.net wrote: just a wild guess: raid-controller BBU faulty We don’t have a BBU in this server, but at least we have redundant power supplies. In any case, how would a fault batter possibly cause this? The controller would turn off the cache. JD -- --ruaok Robert Kaye -- r...@musicbrainz.org --http://musicbrainz.org -- Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, @cmdpromptinc Now I get it: your service is designed for a customer base that grew up with Facebook, watches Japanese seizure robot anime, and has the attention span of a gnat. I'm not that user., Tyler Riddle -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MusicBrainz postgres performance issues
On Sun, Mar 15, 2015 at 8:20 PM, Andres Freund and...@2ndquadrant.com wrote: On 2015-03-15 11:09:34 -0600, Scott Marlowe wrote: shared_mem of 12G is almost always too large. I'd drop it down to ~1G or so. I think that's a outdated wisdom, i.e. not generally true. Quite agreed. With note, that proper configured controller with BBU is needed. A new enough kernel, a sane filesystem (i.e. not ext3) and sane checkpoint configuration takes care of most of the other disadvantages. Most likely. And better to be sure that filesystem mounted without barrier. And I agree with Scott - 64MB work mem AND max_connections = 500 is a recipe for disaster. The problem could be in session mode of pgbouncer. If you can work with transaction mode - do it. Best regards, Ilya Kosmodemiansky, PostgreSQL-Consulting.com tel. +14084142500 cell. +4915144336040 i...@postgresql-consulting.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MusicBrainz postgres performance issues
Why is 500 connections insane. We got 32 CPU with 96GB and 3000 max connections, and we are doing fine, even when hitting our max concurrent connection peaks around 4500. At a previous site, we were using 2000 max connections on 24 CPU and 64GB RAM, with about 1500 max concurrent connections. So I wouldn't be too hasty in saying more than 500 is asking for trouble. Just as long as you got your kernel resources set high enough to sustain it (SHMMAX, SHMALL, SEMMNI, and ulimits), and RAM for work_mem. Tomas Vondra mailto:tomas.von...@2ndquadrant.com Sunday, March 15, 2015 7:41 PM On 15.3.2015 23:47, Andres Freund wrote: On 2015-03-15 12:25:07 -0600, Scott Marlowe wrote: Here's the problem with a large shared_buffers on a machine that's getting pushed into swap. It starts to swap BUFFERs. Once buffers start getting swapped you're not just losing performance, that huge shared_buffers is now working against you because what you THINK are buffers in RAM to make things faster are in fact blocks on a hard drive being swapped in and out during reads. It's the exact opposite of fast. :) IMNSHO that's tackling things from the wrong end. If 12GB of shared buffers drive your 48GB dedicated OLTP postgres server into swapping out actively used pages, the problem isn't the 12GB of shared buffers, but that you require so much memory for other things. That needs to be fixed. I second this opinion. As was already pointed out, the 500 connections is rather insane (assuming the machine does not have hundreds of cores). If there are memory pressure issues, it's likely because many queries are performing memory-expensive operations at the same time (might even be a bad estimate causing hashagg to use much more than work_mem). But! We haven't even established that swapping is an actual problem here. The ~2GB of swapped out memory could just as well be the java raid controller management monstrosity or something similar. Those pages won't ever be used and thus can better be used to buffer IO. You can check what's actually swapped out using: grep ^VmSwap /proc/[0-9]*/status|grep -v '0 kB' For swapping to be actually harmful you need to have pages that are regularly swapped in. vmstat will tell. I've already asked for vmstat logs, so let's wait. In a concurrent OLTP workload (~450 established connections do suggest that) with a fair amount of data keeping the hot data set in shared_buffers can significantly reduce problems. Constantly searching for victim buffers isn't a nice thing, and that will happen if your most frequently used data doesn't fit into s_b. On the other hand, if your data set is so large that even the hottest part doesn't fit into memory (perhaps because there's no hottest part as there's no locality at all), a smaller shared buffers can make things more efficient, because the search for replacement buffers is cheaper with a smaller shared buffers setting. I've met many systems with max_connections values this high, and it was mostly idle connections because of separate connection pools on each application server. So mostly idle (90% of the time), but at peak time all the application servers want to od stuff at the same time. And it all goes KABOOOM! just like here. Andres Freund mailto:and...@2ndquadrant.com Sunday, March 15, 2015 6:47 PM IMNSHO that's tackling things from the wrong end. If 12GB of shared buffers drive your 48GB dedicated OLTP postgres server into swapping out actively used pages, the problem isn't the 12GB of shared buffers, but that you require so much memory for other things. That needs to be fixed. But! We haven't even established that swapping is an actual problem here. The ~2GB of swapped out memory could just as well be the java raid controller management monstrosity or something similar. Those pages won't ever be used and thus can better be used to buffer IO. You can check what's actually swapped out using: grep ^VmSwap /proc/[0-9]*/status|grep -v '0 kB' For swapping to be actually harmful you need to have pages that are regularly swapped in. vmstat will tell. In a concurrent OLTP workload (~450 established connections do suggest that) with a fair amount of data keeping the hot data set in shared_buffers can significantly reduce problems. Constantly searching for victim buffers isn't a nice thing, and that will happen if your most frequently used data doesn't fit into s_b. On the other hand, if your data set is so large that even the hottest part doesn't fit into memory (perhaps because there's no hottest part as there's no locality at all), a smaller shared buffers can make things more efficient, because the search for replacement buffers is cheaper with a smaller shared buffers setting. Greetings, Andres Freund Scott Marlowe mailto:scott.marl...@gmail.com Sunday, March 15, 2015 2:25 PM Here's the problem with a large shared_buffers on a machine that's getting pushed into swap. It starts to swap BUFFERs. Once buffers start getting swapped you're
Re: [PERFORM] MusicBrainz postgres performance issues
On 15.3.2015 23:47, Andres Freund wrote: On 2015-03-15 12:25:07 -0600, Scott Marlowe wrote: Here's the problem with a large shared_buffers on a machine that's getting pushed into swap. It starts to swap BUFFERs. Once buffers start getting swapped you're not just losing performance, that huge shared_buffers is now working against you because what you THINK are buffers in RAM to make things faster are in fact blocks on a hard drive being swapped in and out during reads. It's the exact opposite of fast. :) IMNSHO that's tackling things from the wrong end. If 12GB of shared buffers drive your 48GB dedicated OLTP postgres server into swapping out actively used pages, the problem isn't the 12GB of shared buffers, but that you require so much memory for other things. That needs to be fixed. I second this opinion. As was already pointed out, the 500 connections is rather insane (assuming the machine does not have hundreds of cores). If there are memory pressure issues, it's likely because many queries are performing memory-expensive operations at the same time (might even be a bad estimate causing hashagg to use much more than work_mem). But! We haven't even established that swapping is an actual problem here. The ~2GB of swapped out memory could just as well be the java raid controller management monstrosity or something similar. Those pages won't ever be used and thus can better be used to buffer IO. You can check what's actually swapped out using: grep ^VmSwap /proc/[0-9]*/status|grep -v '0 kB' For swapping to be actually harmful you need to have pages that are regularly swapped in. vmstat will tell. I've already asked for vmstat logs, so let's wait. In a concurrent OLTP workload (~450 established connections do suggest that) with a fair amount of data keeping the hot data set in shared_buffers can significantly reduce problems. Constantly searching for victim buffers isn't a nice thing, and that will happen if your most frequently used data doesn't fit into s_b. On the other hand, if your data set is so large that even the hottest part doesn't fit into memory (perhaps because there's no hottest part as there's no locality at all), a smaller shared buffers can make things more efficient, because the search for replacement buffers is cheaper with a smaller shared buffers setting. I've met many systems with max_connections values this high, and it was mostly idle connections because of separate connection pools on each application server. So mostly idle (90% of the time), but at peak time all the application servers want to od stuff at the same time. And it all goes KABOOOM! just like here. -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MusicBrainz postgres performance issues
(please quote properly) On 2015-03-15 19:55:23 -0400, mich...@sqlexec.com wrote: Why is 500 connections insane. We got 32 CPU with 96GB and 3000 max connections, and we are doing fine, even when hitting our max concurrent connection peaks around 4500. At a previous site, we were using 2000 max connections on 24 CPU and 64GB RAM, with about 1500 max concurrent connections. So I wouldn't be too hasty in saying more than 500 is asking for trouble. Just as long as you got your kernel resources set high enough to sustain it (SHMMAX, SHMALL, SEMMNI, and ulimits), and RAM for work_mem. It may work acceptably in some scenarios, but it can lead to significant problems. Several things in postgres things in postgres scale linearly (from the algorithmic point of view, often CPU characteristics like cache sizes make it wors) with max_connections, most notably acquiring a snapshot. It usually works ok enough if you don't have a high number of queries per second, but if you do, you can run into horrible contention problems. Absurdly enough that matters *more* on bigger machines with several sockets. It's especially bad on 4+ socket systems. The other aspect is that such a high number of full connections usually just isn't helpful for throughput. Not even the most massive NUMA systems (~256 hardware threads is the realistic max atm IIRC) can process 4.5k queries at the same time. It'll often be much more efficient if all connections above a certain number aren't allocated a full postgres backend, with all it's overhead, but use a much more lightweight pooler connection. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MusicBrainz postgres performance issues
On 16/03/15 13:07, Tomas Vondra wrote: On 16.3.2015 00:55, mich...@sqlexec.com wrote: Why is 500 connections insane. We got 32 CPU with 96GB and 3000 max connections, and we are doing fine, even when hitting our max concurrent connection peaks around 4500. At a previous site, we were using 2000 max connections on 24 CPU and 64GB RAM, with about 1500 max concurrent connections. So I wouldn't be too hasty in saying more than 500 is asking for trouble. Just as long as you got your kernel resources set high enough to sustain it (SHMMAX, SHMALL, SEMMNI, and ulimits), and RAM for work_mem. [...] Also, if part of the query required a certain amount of memory for part of the plan, it now holds that memory for much longer too. That only increases the change of OOM issues. [...] Also you could get a situation where a small number of queries their data, relevant indexes, and working memory etc can all just fit into RAM, but the extra queries suddenly reduce the RAM so that even these queries spill to disk, plus the time required to process the extra queries. So a nicely behaved system could suddenly get a lot worse. Even before you consider additional lock contention and other nasty things! It all depends... Cheers, Gavin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MusicBrainz postgres performance issues
On 2015-03-15 20:54:51 +0300, Ilya Kosmodemiansky wrote: On Sun, Mar 15, 2015 at 8:46 PM, Andres Freund and...@2ndquadrant.com wrote: That imo doesn't really have anything to do with it. The primary benefit of a BBU with writeback caching is accelerating (near-)synchronous writes. Like the WAL. My point was, that having no proper raid controller (today bbu surely needed for the controller to be a proper one) + heavy writes of any kind, it is absolutely impossible to live with large shared_buffers and without io problems. And my point is that you're mostly wrong. What a raid controller's writeback usefully cache accelerates is synchronous writes. I.e. writes that the application waits for. Usually raid controllers don't have much chance to reorderer the queued writes (i.e. turning neighboring writes into one larger sequential write). What they do excel at is making synchronous writes to disk return faster because the data is only written to the the controller's memory, not to actual storage. They're also good at avoiding actual writes to disk when the *same* page is written to multiple times in short amount of time. In postgres writes for data that goes through shared_buffers are usally asynchronous. We write them to the OS's page cache when a page is needed for other contents, is undirtied by the bgwriter, or written out during a checkpoint; but do *not* wait for the write to hit the disk. The controller's write back cache doesn't hugely help here, because it doesn't make *that* much of a difference whether the dirty data stays in the kernel's page cache or in the controller. In contrast to that, writes to the WAL are often more or les synchronous. We actually wait (via fdatasync()/fsync() syscalls) for writes to hit disk in a bunch of scenarios, most commonly when committing a transaction. Unless synchronous_commit = off every COMMIT in a transaction that wrote data implies a fdatasync() of a file in pg_xlog (well, we do optimize that in some condition, but let's leave that out for now). Additionally, if there are many smaller/interleaved transactions, we will write()/fdatasync() out the same 8kb WAL page repeatedly. Everytime a transaction commits (and some other things) the page that commit record is on will be flushed. As the WAL records for insertions, updates, deletes, commits are frequently much smaller than 8kb that will often happen 20-100 for the same page in OLTP scenarios with narrow rows. That's why synchronous_commit = off can be such a huge win for OLTP write workloads without a writeback cache - synchronous writes are turned into asynchronous writes, and repetitive writes to the same page are avoided. It also explains why synchronous_commit = off has much less an effect for bulk write workloads: As there are no synchronous disk writes due to WAL flushes at commit time (there's only very few commits), synchronous commit doesn't have much of an effect. That said, there's a couple reasons why you're not completely wrong: Historically, when using ext3 with data=ordered and some other filesystems, synchronous writes to one file forced *all* other previously dirtied data to also be flushed. That means that if you have pg_xlog and the normal relation files on the same filesystem, the synchronous writes to the WAL will not only have to write out the new WAL (often not that much data), but also all the other dirty data. The OS will often be unable to do efficient write combining in that case, because a) there's not yet that much data there, b) postgres doesn't order writes during checkpoints. That means that WAL writes will suddenly have to write out much more data = COMMITs are slow. That's where the suggestion to keep pg_xlog on a separate partion largely comes from. Writes going through shared_buffers are sometimes indirectly turned into synchronous writes (from the OSs perspective at least. Which means they'll done at a higher priority). That happens when the checkpointer fsync()s all the files at the end of a checkpoint. When things are going well and checkpoints are executed infrequently and completely guided by time (i.e. triggered solely by checkpoint_timeout, and not checkpoint_segments) that's usually not too bad. You'll see a relatively small latency spike for transactions. Unfortunately the ext* filesystems have a implementation problem here, which can make this problem much worse: The way writes are priorized during an fsync() can stall out concurrent synchronous reads/writes pretty badly. That's much less of a problem with e.g. xfs. Which is why I'd atm not suggest ext4 for write intensive applications. The other situation where this can lead to big problems is if your checkpoints aren't scheduled by time (you can recognize that by enabling log_checkpoints and check a) that time is the trigger, b) they're actually happening in a interval consistent with checkpoint_timeout). If the relation files are not writtten out in a smoothed out fashion (configured by
Re: [PERFORM] MusicBrainz postgres performance issues
On 16.3.2015 00:55, mich...@sqlexec.com wrote: Why is 500 connections insane. We got 32 CPU with 96GB and 3000 max connections, and we are doing fine, even when hitting our max concurrent connection peaks around 4500. At a previous site, we were using 2000 max connections on 24 CPU and 64GB RAM, with about 1500 max concurrent connections. So I wouldn't be too hasty in saying more than 500 is asking for trouble. Just as long as you got your kernel resources set high enough to sustain it (SHMMAX, SHMALL, SEMMNI, and ulimits), and RAM for work_mem. If all the connections are active at the same time (i.e. running queries), they have to share the 32 cores somehow. Or I/O, if that's the bottleneck. In other words, you're not improving the throughput of the system, you're merely increasing latencies. And it may easily happen that the latency increase is not linear, but grows faster - because of locking, context switches and other process-related management. Imagine you have a query taking 1 second of CPU time. If you have 64 such queries running concurrently on 32 cores, each gets only 1/2 a CPU and so takes =2 seconds. With 500 queries, it's =15 seconds per, etc. If those queries are acquiring the same locks (e.g. updating the same rows, or so), you can imagine what happens ... Also, if part of the query required a certain amount of memory for part of the plan, it now holds that memory for much longer too. That only increases the change of OOM issues. It may work fine when most of the connections are idle, but it makes storms like this possible. -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MusicBrainz postgres performance issues
How many CPUs in play here on the PG Cluster Server, cat /proc/cpuinfo | grep processor | wc -l I see you got pg_stat_statements enabled, what are the SQL you experience during this heavy load time? And does explain on them show a lot of sorting activity that requires more work_mem. Please enable log_checkpoints, so we can see if your checkpoint_segments is adequate. Andres Freund mailto:and...@2ndquadrant.com Sunday, March 15, 2015 6:47 PM IMNSHO that's tackling things from the wrong end. If 12GB of shared buffers drive your 48GB dedicated OLTP postgres server into swapping out actively used pages, the problem isn't the 12GB of shared buffers, but that you require so much memory for other things. That needs to be fixed. But! We haven't even established that swapping is an actual problem here. The ~2GB of swapped out memory could just as well be the java raid controller management monstrosity or something similar. Those pages won't ever be used and thus can better be used to buffer IO. You can check what's actually swapped out using: grep ^VmSwap /proc/[0-9]*/status|grep -v '0 kB' For swapping to be actually harmful you need to have pages that are regularly swapped in. vmstat will tell. In a concurrent OLTP workload (~450 established connections do suggest that) with a fair amount of data keeping the hot data set in shared_buffers can significantly reduce problems. Constantly searching for victim buffers isn't a nice thing, and that will happen if your most frequently used data doesn't fit into s_b. On the other hand, if your data set is so large that even the hottest part doesn't fit into memory (perhaps because there's no hottest part as there's no locality at all), a smaller shared buffers can make things more efficient, because the search for replacement buffers is cheaper with a smaller shared buffers setting. Greetings, Andres Freund Scott Marlowe mailto:scott.marl...@gmail.com Sunday, March 15, 2015 2:25 PM Here's the problem with a large shared_buffers on a machine that's getting pushed into swap. It starts to swap BUFFERs. Once buffers start getting swapped you're not just losing performance, that huge shared_buffers is now working against you because what you THINK are buffers in RAM to make things faster are in fact blocks on a hard drive being swapped in and out during reads. It's the exact opposite of fast. :) Andres Freund mailto:and...@2ndquadrant.com Sunday, March 15, 2015 1:46 PM That imo doesn't really have anything to do with it. The primary benefit of a BBU with writeback caching is accelerating (near-)synchronous writes. Like the WAL. But, besides influencing the default for wal_buffers, a larger shared_buffers doesn't change the amount of synchronous writes. Greetings, Andres Freund Ilya Kosmodemiansky mailto:ilya.kosmodemian...@postgresql-consulting.com Sunday, March 15, 2015 1:42 PM On Sun, Mar 15, 2015 at 8:20 PM, Andres Freundand...@2ndquadrant.com wrote: On 2015-03-15 11:09:34 -0600, Scott Marlowe wrote: shared_mem of 12G is almost always too large. I'd drop it down to ~1G or so. I think that's a outdated wisdom, i.e. not generally true. Quite agreed. With note, that proper configured controller with BBU is needed. A new enough kernel, a sane filesystem (i.e. not ext3) and sane checkpoint configuration takes care of most of the other disadvantages. Most likely. And better to be sure that filesystem mounted without barrier. And I agree with Scott - 64MB work mem AND max_connections = 500 is a recipe for disaster. The problem could be in session mode of pgbouncer. If you can work with transaction mode - do it. Best regards, Ilya Kosmodemiansky, PostgreSQL-Consulting.com tel. +14084142500 cell. +4915144336040 i...@postgresql-consulting.com Andres Freund mailto:and...@2ndquadrant.com Sunday, March 15, 2015 1:20 PM I think that's a outdated wisdom, i.e. not generally true. I've now seen a significant number of systems where a larger shared_buffers can help quite massively. The primary case where it can, in my experience, go bad are write mostly database where every buffer acquiration has to write out dirty data while holding locks. Especially during relation extension that's bad. A new enough kernel, a sane filesystem (i.e. not ext3) and sane checkpoint configuration takes care of most of the other disadvantages. Greetings, Andres Freund
Re: [PERFORM] MusicBrainz postgres performance issues
On 15.3.2015 18:54, Ilya Kosmodemiansky wrote: On Sun, Mar 15, 2015 at 8:46 PM, Andres Freund and...@2ndquadrant.com wrote: That imo doesn't really have anything to do with it. The primary benefit of a BBU with writeback caching is accelerating (near-)synchronous writes. Like the WAL. My point was, that having no proper raid controller (today bbu surely needed for the controller to be a proper one) + heavy writes of any kind, it is absolutely impossible to live with large shared_buffers and without io problems. That is not really true, IMHO. The benefit of the write cache is that it can absorb certain amount of writes, equal to the size of the cache (nowadays usually 512MB or 1GB), without forcing them to disks. It however still has to flush the dirty data to the drives later, but that side usually has much lower throughput - e.g. while you can easily write several GB/s to the controller, the drives usually handle only ~1MB/s of random writes each (I assume rotational drives here). But if you do a lot of random writes (which is likely the case for write-heavy databases), you'll fill the write cache pretty soon and will be bounded by the drives anyway. The controller really can't help with sequential writes much, because the drives already handle that quite well. And SSDs are a completely different story of course. That does not mean the write cache is useless - it can absorb short bursts of random writes, fix the write hole with RAID5, the controller may compute the parity computation etc. Whenever someone asks me whether they should buy a RAID controller with write cache for their database server, my answer is absolutely yes in 95.23% cases ... ... but really it's not something that magically changes the limits for write-heavy databases - the main limit are still the drives. regards Tomas -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MusicBrainz postgres performance issues
I agree with your counter argument about how high max_connections can cause problems, but max_connections may not part of the problem here. There's a bunch of depends stuff in there based on workload details, # cpus, RAM, etc. I'm still waiting to find out how many CPUs on this DB server. Did i miss it somewhere in the email thread below? Tomas Vondra mailto:tomas.von...@2ndquadrant.com Sunday, March 15, 2015 8:07 PM If all the connections are active at the same time (i.e. running queries), they have to share the 32 cores somehow. Or I/O, if that's the bottleneck. In other words, you're not improving the throughput of the system, you're merely increasing latencies. And it may easily happen that the latency increase is not linear, but grows faster - because of locking, context switches and other process-related management. Imagine you have a query taking 1 second of CPU time. If you have 64 such queries running concurrently on 32 cores, each gets only 1/2 a CPU and so takes =2 seconds. With 500 queries, it's =15 seconds per, etc. If those queries are acquiring the same locks (e.g. updating the same rows, or so), you can imagine what happens ... Also, if part of the query required a certain amount of memory for part of the plan, it now holds that memory for much longer too. That only increases the change of OOM issues. It may work fine when most of the connections are idle, but it makes storms like this possible. mich...@sqlexec.com mailto:mich...@sqlexec.com Sunday, March 15, 2015 7:55 PM Why is 500 connections insane. We got 32 CPU with 96GB and 3000 max connections, and we are doing fine, even when hitting our max concurrent connection peaks around 4500. At a previous site, we were using 2000 max connections on 24 CPU and 64GB RAM, with about 1500 max concurrent connections. So I wouldn't be too hasty in saying more than 500 is asking for trouble. Just as long as you got your kernel resources set high enough to sustain it (SHMMAX, SHMALL, SEMMNI, and ulimits), and RAM for work_mem. Tomas Vondra mailto:tomas.von...@2ndquadrant.com Sunday, March 15, 2015 7:41 PM On 15.3.2015 23:47, Andres Freund wrote: On 2015-03-15 12:25:07 -0600, Scott Marlowe wrote: Here's the problem with a large shared_buffers on a machine that's getting pushed into swap. It starts to swap BUFFERs. Once buffers start getting swapped you're not just losing performance, that huge shared_buffers is now working against you because what you THINK are buffers in RAM to make things faster are in fact blocks on a hard drive being swapped in and out during reads. It's the exact opposite of fast. :) IMNSHO that's tackling things from the wrong end. If 12GB of shared buffers drive your 48GB dedicated OLTP postgres server into swapping out actively used pages, the problem isn't the 12GB of shared buffers, but that you require so much memory for other things. That needs to be fixed. I second this opinion. As was already pointed out, the 500 connections is rather insane (assuming the machine does not have hundreds of cores). If there are memory pressure issues, it's likely because many queries are performing memory-expensive operations at the same time (might even be a bad estimate causing hashagg to use much more than work_mem). But! We haven't even established that swapping is an actual problem here. The ~2GB of swapped out memory could just as well be the java raid controller management monstrosity or something similar. Those pages won't ever be used and thus can better be used to buffer IO. You can check what's actually swapped out using: grep ^VmSwap /proc/[0-9]*/status|grep -v '0 kB' For swapping to be actually harmful you need to have pages that are regularly swapped in. vmstat will tell. I've already asked for vmstat logs, so let's wait. In a concurrent OLTP workload (~450 established connections do suggest that) with a fair amount of data keeping the hot data set in shared_buffers can significantly reduce problems. Constantly searching for victim buffers isn't a nice thing, and that will happen if your most frequently used data doesn't fit into s_b. On the other hand, if your data set is so large that even the hottest part doesn't fit into memory (perhaps because there's no hottest part as there's no locality at all), a smaller shared buffers can make things more efficient, because the search for replacement buffers is cheaper with a smaller shared buffers setting. I've met many systems with max_connections values this high, and it was mostly idle connections because of separate connection pools on each application server. So mostly idle (90% of the time), but at peak time all the application servers want to od stuff at the same time. And it all goes KABOOOM! just like here. Andres Freund mailto:and...@2ndquadrant.com Sunday, March 15, 2015 6:47 PM IMNSHO that's tackling things from the wrong end. If 12GB of shared buffers drive your 48GB dedicated OLTP postgres server into swapping out
Re: [PERFORM] MusicBrainz postgres performance issues
On 2015-03-15 12:25:07 -0600, Scott Marlowe wrote: Here's the problem with a large shared_buffers on a machine that's getting pushed into swap. It starts to swap BUFFERs. Once buffers start getting swapped you're not just losing performance, that huge shared_buffers is now working against you because what you THINK are buffers in RAM to make things faster are in fact blocks on a hard drive being swapped in and out during reads. It's the exact opposite of fast. :) IMNSHO that's tackling things from the wrong end. If 12GB of shared buffers drive your 48GB dedicated OLTP postgres server into swapping out actively used pages, the problem isn't the 12GB of shared buffers, but that you require so much memory for other things. That needs to be fixed. But! We haven't even established that swapping is an actual problem here. The ~2GB of swapped out memory could just as well be the java raid controller management monstrosity or something similar. Those pages won't ever be used and thus can better be used to buffer IO. You can check what's actually swapped out using: grep ^VmSwap /proc/[0-9]*/status|grep -v '0 kB' For swapping to be actually harmful you need to have pages that are regularly swapped in. vmstat will tell. In a concurrent OLTP workload (~450 established connections do suggest that) with a fair amount of data keeping the hot data set in shared_buffers can significantly reduce problems. Constantly searching for victim buffers isn't a nice thing, and that will happen if your most frequently used data doesn't fit into s_b. On the other hand, if your data set is so large that even the hottest part doesn't fit into memory (perhaps because there's no hottest part as there's no locality at all), a smaller shared buffers can make things more efficient, because the search for replacement buffers is cheaper with a smaller shared buffers setting. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MusicBrainz postgres performance issues
On 3/15/2015 6:54 AM, Robert Kaye wrote: Hi! We at MusicBrainz have been having trouble with our Postgres install for the past few days. I’ve collected all the relevant information here: http://blog.musicbrainz.org/2015/03/15/postgres-troubles/ If anyone could provide tips, suggestions or other relevant advice for what to poke at next, we would love it. Robert, Wow - You've engaged the wizards indeed. I haven't heard or seen anything that would answer my *second* question if faced with this (my first would have been what changed) What is the database actually trying to do when it spikes? e.g. what queries are running ? Is there any pattern in the specific activity (exactly the same query, or same query different data, or even just same tables, and/or same users, same apps) when it spikes? I know from experience that well behaved queries can stop being well behaved if underlying data changes and for the experts... what would a corrupt index do to memory usage? Roxanne