Re: [PERFORM] Scalability in postgres
On Sat, May 30, 2009 at 9:41 PM, Greg Smith wrote: > On Fri, 29 May 2009, Scott Carey wrote: > >> There are operations/IT people won't touch Ubuntu etc with a ten foot pole >> yet for production. > > The only thing I was suggesting is that because 2.6.28 is the latest Ubuntu > kernel, that means it's gotten a lot more exposure and testing than, say, > other options like 2.6.27 or 2.6.29. > > I build a fair number of RedHat/CentOS systems with an upgraded kernel based > on mature releases from kernel.org, and a config as close as possible to the > original RedHat one, with the generic kernel defaults for all the new > settings. I keep liking that combination better than just using an Ubuntu > version with a newer kernel. I've seen a couple of odd kernel setting > choices in Ubuntu releases before that motivate that choice; the scheduler > trainwreck described at > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/188226 comes to mind. 8.04 was a frakking train wreck in many ways. It wasn't until 8.04.2 came out that it was even close to useable as a server OS, and even then, not for databases yet. It's still got broken bits and pieces marked "fixed in 8.10"... Uh, hello, it's your LTS release, fixes should be made there as a priority. There's a reason my dbs run on Centos / RHEL. It's not the fastest release ever, but it doesn't go down on me and it just works. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] degenerate performance on one server of 3
I have 3 servers, all with identical databases, and each performing very differently for the same queries. www3 is my fastest, www2 is the worst, and www1 is in the middle... even though www2 has more ram, faster CPU and faster drives (by far), and is running a newer version of postgres. I have been reluctant to post because I know it's something that I'm doing wrong in the settings or something that I should be able to figure out. Last year at this time www2 was the fastest... in fact, I bought the machine to be my "primary" server, and it performed as such with the striped volumes and higher RAM, it outpaced the other 2 in every query. It has, over time, "degenerated" to being so slow it frequently has to be taken out of the load-balance set. The only major changes to the sever have been "yum update (or ports upgrade)" to the newer releases ... over time. The query planner "knows" about the problem, but I'm not sure *why* there's a difference... since the tables all have the same data ... loaded from a dump nightly. The planner shows a different number of "rows" even though the items table has 22680 rows in all 3 instances. I ran a vacuum analyze just before these runs hoping to get them all into a similar "clean" state. The difference is outlined below, with the query planner output from a table-scan query that greatly exaggerates the differences in performance, along with some info about the configuration and platform differences. QUERY: explain select count(*) from items where name like '%a%' www3: psql (PostgreSQL) 8.1.14 www3: Linux www3 2.6.23.17-88.fc7 #1 SMP Thu May 15 00:02:29 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux www3: Mem: 1996288k total, 1537576k used, 458712k free,23124k buffers www3: Swap:0k total,0k used,0k free, 1383208k cached www3: shared_buffers = 1# min 16 or max_connections*2, 8KB each www3: QUERY PLAN www3: -- www3: Aggregate (cost=3910.07..3910.08 rows=1 width=0) www3:-> Seq Scan on items (cost=0.00..3853.39 rows=22671 width=0) www3: Filter: (name ~~ '%a%'::text) www3: (3 rows) www3: www1: psql (PostgreSQL) 8.1.17 www1: Linux www1 2.6.26.8-57.fc8 #1 SMP Thu Dec 18 18:59:49 EST 2008 x86_64 x86_64 x86_64 GNU/Linux www1: Mem: 1019376k total, 973064k used,46312k free,27084k buffers www1: Swap: 1959888k total,17656k used, 1942232k free, 769776k cached www1: shared_buffers = 6000 # min 16 or max_connections*2, 8KB each www1: QUERY PLAN www1: -- www1: Aggregate (cost=5206.20..5206.21 rows=1 width=0) www1:-> Seq Scan on items (cost=0.00..5149.50 rows=22680 width=0) www1: Filter: (name ~~ '%a%'::text) www1: (3 rows) www1: www2: psql (PostgreSQL) 8.2.13 www2: FreeBSD www2 6.3-RELEASE-p7 FreeBSD 6.3-RELEASE-p7 #0: Sun Dec 21 03:24:04 UTC 2008 r...@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/SMP amd64 www2: Mem: 57M Active, 1078M Inact, 284M Wired, 88M Cache, 213M Buf, 10M Free www2: Swap: 4065M Total, 144K Used, 4065M Free www2: shared_buffers = 360MB # min 128kB or max_connections*16kB www2: QUERY PLAN www2: -- www2: Aggregate (cost=17659.45..17659.46 rows=1 width=0) www2:-> Seq Scan on items (cost=0.00..17652.24 rows=2886 width=0) www2: Filter: (name ~~ '%a%'::text) www2: (3 rows) www2: -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] degenerate performance on one server of 3
Erik Aronesty writes: > I have 3 servers, all with identical databases, and each performing > very differently for the same queries. I'm betting on varying degrees of table bloat. Have you tried vacuum full, cluster, etc? regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Scalability in postgres
2009/5/29 Scott Carey > > On 5/28/09 6:54 PM, "Greg Smith" wrote: > > > 2) You have very new hardware and a very old kernel. Once you've done > the > > above, if you're still not happy with performance, at that point you > > should consider using a newer one. It's fairly simple to build a Linux > > kernel using the same basic kernel parameters as the stock RedHat one. > > 2.6.28 is six months old now, is up to 2.6.28.10, and has gotten a lot > > more testing than most kernels due to it being the Ubuntu 9.04 default. > > I'd suggest you try out that version. > > > Comparing RedHat's 2.6.18, heavily patched, fix backported kernel to the > original 2.6.18 is really hard. Yes, much of it is old, but a lot of stuff > has been backported. > I have no idea if things related to this case have been backported. > Virtual > memory management is complex and only bug fixes would likely go in however. > But RedHat 5.3 for example put all the new features for Intel's latest > processor in the release (which may not even be in 2.6.28!). > > There are operations/IT people won't touch Ubuntu etc with a ten foot pole > yet for production. That may be irrational, but such paranoia exists. The > latest postgres release is generally a hell of a lot safer than the latest > linux kernel, and people get paranoid about their DB. > > If you told someone who has to wake up at 3AM by page if the system has an > error that "oh, we patched our own kenrel build into the RedHat OS" they > might not be ok with that. > > Its a good test to see if this problem is fixed in the kernel. I've seen > CentOS 5.2 go completely nuts with system CPU time and context switches > with > kswapd many times before. I haven't put the system under the same stress > with 5.3 yet however. > One of the server is: Intel Xeon X7350 2.93GHz, RH 5.3 and kernel 2.6.18-128.el5. and the perfonmace is bad too, so i don't think the probles is the kernel The two servers that I tested (HP-785 Opteron and IBM x3950 M2 Xeon) have NUMA architecture. and I thought the problem was caused by NUMA. http://archives.postgresql.org/pgsql-admin/2008-11/msg00157.php I'm trying another server, an HP blade bl 680 with Xeon E7450 (4 CPU x 6 cores= 24 cores) without NUMA architecture, but the CPUs are also going up. procs ---memory-- ---swap-- -io --system-- -cpu-- r b swpd free buff cache si sobibo in cs us sy id wa st 1 0 0 46949972 116908 1703296400153122 1 0 98 0 0 2 0 0 46945880 116916 170330680072 140 2059 3140 1 1 97 0 0 329 0 0 46953260 116932 170332080024 612 1435 194237 44 3 53 0 0 546 0 0 46952912 116940 1703320800 4 136 1090 327047 96 4 0 0 0 562 0 0 46951052 116940 1703322400 0 0 1095 323034 95 4 0 0 0 514 0 0 46949200 116952 1703321200 0 224 1088 330178 96 3 1 0 0 234 0 0 46948456 116952 1703321200 0 0 1106 315359 91 5 4 0 0 4 0 0 46958376 116968 170332720016 396 1379 223499 47 3 49 0 0 1 1 0 46941644 116976 1703322400 152 1140 2662 5540 4 2 93 1 0 1 0 0 46943196 116984 1703324800 104 604 2307 3992 4 2 94 0 0 1 1 0 46931544 116996 1703356800 104 4304 2318 3585 1 1 97 1 0 0 0 0 46943572 117004 170335680032 204 2007 2986 1 1 98 0 0 Now i don't think the probles is NUMA. The developer team will fix de aplication and then i will test again. I believe that when the application closes the connection the problem could be solved, and then 16 cores in a server does the work instead of a 32 or 24. Regards... --Fabrix
Re: [PERFORM] autovacuum hung?
Tom Lane [...@sss.pgh.pa.us] wrote: Are those processes actually doing anything, or just waiting? strace or local equivalent would be the most conclusive check. These must not have been hung, because they finally completed (after 10-15 hrs - some time between 11pm and 8am). Question is why does it take so long to do this on such a relatively small table? This query isn't very helpful because it fails to show locks that are not directly associated with tables. How can that (locks not directly associated...) be determined? Thanks, Brian -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] autovacuum hung?
Brian Cox writes: > Tom Lane [...@sss.pgh.pa.us] wrote: >> Are those processes actually doing anything, or just waiting? strace >> or local equivalent would be the most conclusive check. > These must not have been hung, because they finally completed (after > 10-15 hrs - some time between 11pm and 8am). Question is why does it > take so long to do this on such a relatively small table? They might have been blocked behind some other process that was sitting in an open transaction for some reason. The other likely cause is badly chosen autovacuum delay, but I think that was already covered. >> This query isn't very helpful because it fails to show locks that are >> not directly associated with tables. > How can that (locks not directly associated...) be determined? Don't assume every row in pg_locks has a join partner in pg_class. You could use an outer join ... regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] autovacuum hung?
Tom Lane [...@sss.pgh.pa.us] wrote: They might have been blocked behind some other process that was sitting in an open transaction for some reason. The other likely cause is badly chosen autovacuum delay, but I think that was already covered. Well, after I noticed this running for a while, I shutdown the postgres port and restarted postgres. The autovacuum of these tables kicked in promptly when postgres was back up. I then let them run. So, I don't think that surmise #1 is likely. As for #2, I'm using the default. These tables get updated once a day with each row (potentially) being updated 1-24 times over many minutes to a handful of hours. Dp you think it would be better to manually vacuum these tables? If so, would it be best to disable autovacuum of them? And while I'm at it, if you disable autovacuum of the master table will that disable it for the actual partitions? > Don't assume every row in pg_locks has a join partner in pg_class. You could use an outer join ... Yes, of course. It never occurred that there could be db locks not associated with tables. Thanks, Brian -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] autovacuum hung?
Brian Cox writes: > Dp you think it would be better to manually > vacuum these tables? If so, would it be best to disable autovacuum of > them? And while I'm at it, if you disable autovacuum of the master table > will that disable it for the actual partitions? No, no, and no. What would be best is to find out what actually happened. The evidence is gone now, but if you see it again please take a closer look. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] autovacuum hung?
Tom Lane [...@sss.pgh.pa.us] wrote: No, no, and no. What would be best is to find out what actually happened. The evidence is gone now, but if you see it again please take a closer look. OK. You mentioned strace. It's got a lot of options; any in particular that would be useful if this happens again? Brian -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] autovacuum hung?
Brian Cox writes: > OK. You mentioned strace. It's got a lot of options; any in particular > that would be useful if this happens again? I'd just do "strace -p processID" and watch it for a little while. If it's not hung, you'll see the process issuing kernel calls at some rate or other. If it is hung, you'll most likely see something like semop(...) and it just sits there. Also, if you see nothing but a series of select()s with varying timeouts, that would suggest a stuck spinlock (although I doubt that was happening, as it would eventually timeout and report a failure). regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] autovacuum hung?
Brian Cox writes: > OK. You mentioned strace. It's got a lot of options; any in particular > that would be useful if this happens again? Oh, and don't forget the more-complete pg_locks state. We'll want all the columns of pg_locks, not just the ones you showed before. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] degenerate performance on one server of 3
Tom Lane wrote: > Erik Aronesty writes: >> I have 3 servers, all with identical databases, and each performing >> very differently for the same queries. > > I'm betting on varying degrees of table bloat. Have you tried vacuum > full, cluster, etc? Or, if you have been using VACUUM FULL, try REINDEXing the tables, because it could easily be index bloat. Clustering the table will take care of index bloat as well as table bloat. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] degenerate performance on one server of 3
Craig Ringer writes: > Tom Lane wrote: >> I'm betting on varying degrees of table bloat. Have you tried vacuum >> full, cluster, etc? > Or, if you have been using VACUUM FULL, try REINDEXing the tables, > because it could easily be index bloat. Clustering the table will take > care of index bloat as well as table bloat. Index bloat wouldn't explain the slow-seqscan behavior the OP was complaining of. Still, you're right that if the tables are bloated then their indexes probably are too ... and that VACUUM FULL alone will not fix that. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] degenerate performance on one server of 3
it was all vacuum full...thanks the other 2 servers truncate and reload that table from time to time ... IE: they are always vacuumed as the "master" ... that server never does it... hence the bloat but why wasn't autovac enough to reclaim at least *most* of the space? that table *does* get updated every day... but rows are not overwritten, just edited. it seems that most of the pages should be "reused" via autovac On Sun, May 31, 2009 at 11:40 PM, Tom Lane wrote: > Craig Ringer writes: >> Tom Lane wrote: >>> I'm betting on varying degrees of table bloat. Have you tried vacuum >>> full, cluster, etc? > >> Or, if you have been using VACUUM FULL, try REINDEXing the tables, >> because it could easily be index bloat. Clustering the table will take >> care of index bloat as well as table bloat. > > Index bloat wouldn't explain the slow-seqscan behavior the OP was > complaining of. Still, you're right that if the tables are bloated > then their indexes probably are too ... and that VACUUM FULL alone > will not fix that. > > regards, tom lane > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance > -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Vacuuming technique doubt
Having a doubt, we want to vacuum and reindex some 50 most used tables daily on specific time. Is it best to have a function in postgres and call it in cron or is there any other good way to do the two process for specified tables at specified time? -Arvind S * **"Many of lifes failure are people who did not realize how close they were to success when they gave up." -Thomas Edison*