Re: [PERFORM] limiting performance impact of wal archiving.
On Thu, Nov 12, 2009 at 3:21 PM, Laurent Laborde kerdez...@gmail.com wrote: Hi ! Here is my plan : - rebuilding a spare with ext3, raid10, without lvm - switch the slony master to this new node. Done 3 days ago : Problem solved ! It totally worked. \o/ -- ker2x sysadmin DBA @ http://www.over-blog.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Unexpected sequential scan on an indexed column
OK, I think that after reading this dochttp://www.postgresql.org/files/developer/optimizer.pdf (which I hadn't encountered before) about the optimizer, something clicked in my brain and I think I can answer my own question. I was basically thinking from my own perspective rather than from the query planner's perspective: - From my perspective I know that the subselect will return very few values, so naively I expected that the planner would be able to do a bitmap index scan with the small set of values returned, without needing to do a join (such as the nested loop join it ended up choosing). - However (and this is probably obvious to all of you), the query planner doesn't really know for a fact that a sub-select will result in a small number of rows, so it guesses based on its statistics what the best kind of join would be. A 'bitmap index scan' is not one of the choices for a join, I'm guessing because a 'nested loop join with inner index scan' is a more generally applicable strategy that can get the same order of magnitude of performance in restriction cases that end up being as simple as an IN (list) restriction. However, there are more competing possibilities for picking an appropriate join strategy than for picking a strategy to apply an IN (list) restriction, so the planner may not pick the 'nested loop join with inner index scan' if the ANALYZE statistics don't guide it that way, even if that would be the best strategy in the end. I guess the only way I can think of to make a generic planner that would have performend well even in the lopsided statistics case is to create some plan nodes with contingency conditions. E.g.: Plan: Nested loop join with sequential scan Assumption: all table values are the same Contingency plan: nested loop join with index scan Then, if the assumption for the plan is violated early enough while executing the plan, the query executor would abort that plan node execution and start over with the contingency plan. I guess implementing this kind of system in a generic way could get pretty hairy, and given my limited experience I don't know if the proportion of query plans that would be improved by having these kinds of contingency plans is significant enough to warrant the cost of developing this system, but I'm gathering that most query planners (including the postgres planner) don't do this kind of contingency planning :) Thanks! Eddy On Sun, Nov 15, 2009 at 5:46 PM, Eddy Escardo-Raffo eesca...@kikini.comwrote: I was using VALUES in my examples to more closely mirror the results of a sub-select (I abstracted everything else away and noticed that even just using VALUES syntax instead of a sub-select, the performance was bad). The full statement I had that led me into this more narrow investigation in the first place looks more like: explain analyze SELECT u.userid FROM users u, (SELECT locid FROM locations WHERE ...) l WHERE u.location = l.locid LIMIT 10; Based on the investigation so far, it seems like this kind of statement will perform well when the users.location distribution is not overwhelmingly lopsided, but not otherwise. However, using the IN (list) notation with a list of integer literals seems to perform well no matter what is the specific distribution of values in the users.location column. I would like to understand why this is so, to help me write better queries in the future. Thanks, Eddy On Sun, Nov 15, 2009 at 5:23 PM, Tom Lane t...@sss.pgh.pa.us wrote: Eddy Escardo-Raffo eesca...@kikini.com writes: For C, the planner estimated 10 thousand rows. For D, the planner estimated 100 thousand rows, yet for E the planner estimated only 1 row, which is the closest to reality. So, is there any way to specify a query that has a SUB-SELECT that returns a small set of values so that the planner treats it similar to how it treats statement E, or does statement E get its additional edge precisely from the fact that the restriction is defined by integer literals? Currently there is no attempt to look at the exact contents of a VALUES construct for planning purposes. For the examples you're showing it seems like the IN (list) notation is more compact and more widely used, so improving the VALUES alternative doesn't seem that exciting. regards, tom lane
Re: [PERFORM] Manual vacs 5x faster than autovacs?
Quoting Scott Marlowe scott.marl...@gmail.com: On Thu, Nov 12, 2009 at 9:58 AM, Wayne Beaver wa...@acedsl.com wrote: Quoting Scott Marlowe scott.marl...@gmail.com: On Thu, Nov 12, 2009 at 9:14 AM, Wayne Beaver wa...@acedsl.com wrote: I'd seen autovacs running for hours and had mis-attributed this to growing query times on those tables - my thought was that shrinking the tables more quickly could make them more-optimized, more often. Sounds like could be chasing the wrong symptoms, though. Now it is quite possible that a slow autovac is causing your queries to run slower. So, if it has a moderate to high cost delay, then it might not be able to keep up with the job and your tables will become bloated. The problem isn't that autovac is stealing too many resources, it's that it's not stealing enough. I've not yet gotten to you iostat inquiry from your previous response... Don't worry too much, just want to see if your IO system is maxed out. $ iostat Linux 2.6.18.8-0.9-default (myserver) 11/16/2009 avg-cpu: %user %nice %system %iowait %steal %idle 28.113.136.508.710.00 53.56 Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 153.08 7295.23 3675.59 123127895363 62036043656\ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Manual vacs 5x faster than autovacs?
On Mon, Nov 16, 2009 at 9:13 AM, Wayne Beaver wa...@acedsl.com wrote: Quoting Scott Marlowe scott.marl...@gmail.com: On Thu, Nov 12, 2009 at 9:58 AM, Wayne Beaver wa...@acedsl.com wrote: Quoting Scott Marlowe scott.marl...@gmail.com: On Thu, Nov 12, 2009 at 9:14 AM, Wayne Beaver wa...@acedsl.com wrote: I'd seen autovacs running for hours and had mis-attributed this to growing query times on those tables - my thought was that shrinking the tables more quickly could make them more-optimized, more often. Sounds like could be chasing the wrong symptoms, though. Now it is quite possible that a slow autovac is causing your queries to run slower. So, if it has a moderate to high cost delay, then it might not be able to keep up with the job and your tables will become bloated. The problem isn't that autovac is stealing too many resources, it's that it's not stealing enough. I've not yet gotten to you iostat inquiry from your previous response... Don't worry too much, just want to see if your IO system is maxed out. $ iostat Linux 2.6.18.8-0.9-default (myserver) 11/16/2009 avg-cpu: %user %nice %system %iowait %steal %idle 28.11 3.13 6.50 8.71 0.00 53.56 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 153.08 7295.23 3675.59 123127895363 62036043656\ That's just since the machine was turned on. run it like: iostat -x 10 and see what comes out after the first one. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Manual vacs 5x faster than autovacs?
Quoting Scott Marlowe scott.marl...@gmail.com: On Mon, Nov 16, 2009 at 9:13 AM, Wayne Beaver wa...@acedsl.com wrote: Quoting Scott Marlowe scott.marl...@gmail.com: On Thu, Nov 12, 2009 at 9:58 AM, Wayne Beaver wa...@acedsl.com wrote: Quoting Scott Marlowe scott.marl...@gmail.com: On Thu, Nov 12, 2009 at 9:14 AM, Wayne Beaver wa...@acedsl.com wrote: I'd seen autovacs running for hours and had mis-attributed this to growing query times on those tables - my thought was that shrinking the tables more quickly could make them more-optimized, more often. Sounds like could be chasing the wrong symptoms, though. Now it is quite possible that a slow autovac is causing your queries to run slower. So, if it has a moderate to high cost delay, then it might not be able to keep up with the job and your tables will become bloated. The problem isn't that autovac is stealing too many resources, it's that it's not stealing enough. I've not yet gotten to you iostat inquiry from your previous response... Don't worry too much, just want to see if your IO system is maxed out. That's just since the machine was turned on. run it like: iostat -x 10 and see what comes out after the first one. Duh! Sorry about that... $ iostat -x 10 Linux 2.6.18.8-0.9-default (myserver) 11/16/2009 avg-cpu: %user %nice %system %iowait %steal %idle 28.113.136.508.700.00 53.56 Device:rrqm/s wrqm/s r/s w/s rsec/s wsec/srkB/swkB/s avgrq-sz avgqu-sz await svctm %util sda 3.20 406.34 100.74 52.33 7293.84 3675.79 3646.92 1837.9071.66 0.072.15 0.90 13.71 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Unexpected sequential scan on an indexed column
Hi Eddy Perhaps a slightly naive suggestion have you considered converting the query to a small stored procedure ('function' in Postgres speak)? You can pull the location values, and then iterate over a query like this: select userid from users where location=:x which is more-or-less guaranteed to use the index. I had a somewhat similar situation recently, where I was passing in a list of id's (from outwith Postgres) and it would on occasion avoid the index in favour of a full table scan I changed this to iterate over the id's with separate queries (in Java, but using a function will achieve the same thing) and went from one 5 minute query doing full table scan to a handful of queries doing sub-millisecond direct index lookups. Cheers Dave -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Is Diskeeper Automatic Mode safe?
I've got a pair of servers running PostgreSQL 8.0.4 on Windows. We have several tables that add and delete massive amounts of data in a single day and are increasingly having a problem with drive fragmentation and it appears to be giving us a decent performance hit. This is external fragmentation we are dealing with. We already vacuum the tables on a regular basis to reduce internal fragmentation as best as possible. Currently I shut down the PostgreSQL service every few weeks and manually run a defragment of the drive, but this is getting tedious. Diskeeper has an Automatic Mode that runs in the background all the time to handle this for me. They advertise they are compatible with MS SQL server, but don't appear to have any specific info on PostgreSQL. I'm curious if anyone else has used Diskeeper's Automatic Mode in combination with PostgreSQL to defrag and keep the drive defragged while PostgreSQL is actually running. Thanks! -chris www.mythtech.net -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Is Diskeeper Automatic Mode safe?
On Mon, Nov 16, 2009 at 12:14 PM, cb c...@mythtech.net wrote: I've got a pair of servers running PostgreSQL 8.0.4 on Windows. We have several tables that add and delete massive amounts of data in a single day and are increasingly having a problem with drive fragmentation and it appears to be giving us a decent performance hit. This is external fragmentation we are dealing with. We already vacuum the tables on a regular basis to reduce internal fragmentation as best as possible. Currently I shut down the PostgreSQL service every few weeks and manually run a defragment of the drive, but this is getting tedious. Diskeeper has an Automatic Mode that runs in the background all the time to handle this for me. They advertise they are compatible with MS SQL server, but don't appear to have any specific info on PostgreSQL. I'm curious if anyone else has used Diskeeper's Automatic Mode in combination with PostgreSQL to defrag and keep the drive defragged while PostgreSQL is actually running. Thanks! -chris www.mythtech.net I'm not sure what the answer is to your actual question, but I'd highly recommend upgrading to 8.3 or 8.4. The performance is likely to be a lot better, and 8.0/8.1 are no longer supported on Windows. ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Is Diskeeper Automatic Mode safe?
cb wrote: I'm curious if anyone else has used Diskeeper's Automatic Mode in combination with PostgreSQL to defrag and keep the drive defragged while PostgreSQL is actually running. Thanks! -chris www.mythtech.net I've been a Diskeeper customer for about 10 years now and consider it 'must have' software for Windows machines. I do not work for them nor get paid by them, I just find the software incredibly valuable. I'm running XP-64bit with 8.4.0 and Diskeeper does a wonderful job of defragmenting the database tables when they get fragmented. I just checked their website and the 2009 version is still listed. I've been running the 2010 Enterprise Server version for about a week and I can tell you that it's great! (I'm actually running it on 3 servers but only mine has PG) The main difference with the 2010 version is something that they call IntelliWrite. As everyone knows, one of the biggest problems with the Windows OS is that it lets fragmentation occur in the first place. This new IntelliWrite actually prevents the fragmentation from occurring in the first place (or at least the vast majority of it). The auto defrag takes care of the rest. I can attest to this actually working in real life scenarios. The other thing Diskeeper has is something they call I-FAAST. What this does is monitor file usage and moves the most heavily accessed files to the fastest part of the drive. My db is on an Adaptec 52445 with 16 ST373455SS (15K5) in RAID5 and Diskeeper defrags and moves pretty much everything in \data\base to the outer part of the drive. So the short answer is yes, I have it running with PostgreSQL and have not had any problems. Bob -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Is Diskeeper Automatic Mode safe?
On Mon, Nov 16, 2009 at 1:11 PM, Robert Schnabel schnab...@missouri.edu wrote: cb wrote: I'm curious if anyone else has used Diskeeper's Automatic Mode in combination with PostgreSQL to defrag and keep the drive defragged while PostgreSQL is actually running. Thanks! -chris www.mythtech.net I've been a Diskeeper customer for about 10 years now and consider it 'must have' software for Windows machines. I do not work for them nor get paid by them, I just find the software incredibly valuable. I'm running XP-64bit with 8.4.0 and Diskeeper does a wonderful job of defragmenting the database tables when they get fragmented. I just checked their website and the 2009 version is still listed. I've been running the 2010 Enterprise Server version for about a week and I can tell you that it's great! (I'm actually running it on 3 servers but only mine has PG) The main difference with the 2010 version is something that they call IntelliWrite. As everyone knows, one of the biggest problems with the Windows OS is that it lets fragmentation occur in the first place. This new IntelliWrite actually prevents the fragmentation from occurring in the first place (or at least the vast majority of it). The auto defrag takes care of the rest. I can attest to this actually working in real life scenarios. The other thing Diskeeper has is something they call I-FAAST. What this does is monitor file usage and moves the most heavily accessed files to the fastest part of the drive. My db is on an Adaptec 52445 with 16 ST373455SS (15K5) in RAID5 and Diskeeper defrags and moves pretty much everything in \data\base to the outer part of the drive. So the short answer is yes, I have it running with PostgreSQL and have not had any problems. Have you unplugged the power cord a few times in the middle of heavy write activity? ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Is Diskeeper Automatic Mode safe?
So the short answer is yes, I have it running with PostgreSQL and have not had any problems. Have you unplugged the power cord a few times in the middle of heavy write activity? ...Robert Nope. Forgive my ignorance but isn't that what a UPS is for anyway? Along with a BBU controller. Bob
Re: [PERFORM] Is Diskeeper Automatic Mode safe?
On Mon, Nov 16, 2009 at 1:04 PM, Robert Schnabel schnab...@missouri.edu wrote: So the short answer is yes, I have it running with PostgreSQL and have not had any problems. Have you unplugged the power cord a few times in the middle of heavy write activity? ...Robert Nope. Forgive my ignorance but isn't that what a UPS is for anyway? Along with a BBU controller. BBU controller, yes. UPS, no. I've seen more than one multi-million dollar hosting center go down from something as simple as a piece of wire flying into a power conditioner, shorting it out, and feeding back and blowing every single power conditioner and UPS AND the switch that allowed the diesel to come into the loop. All failed. Every machine lost power. One database server out of a few dozens came back up. In fact there were a lot of different dbm systems running in that center, and only the pg 7.2 version came back up unscathed. Because someone insisted on pulling the plug out from the back a dozen or so times to make sure it would do come back up. PG saved our shorts and the asses they contain. Sad thing is I'm sure the other servers COULD have come back up if they had been running proper BBUs and hard drives that didn't lie about fsync, and an OS that enforced fsync properly, at least for scsi, at the time. Power supplies / UPSes fail far more often than one might think. And a db that doesn't come back up afterwards is not to be placed into production. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Is Diskeeper Automatic Mode safe?
On Mon, Nov 16, 2009 at 1:12 PM, Scott Marlowe scott.marl...@gmail.com wrote: Power supplies / UPSes fail far more often than one might think. And a db that doesn't come back up afterwards is not to be placed into production. Note that there are uses for databases that can lose everything and just initdb and be happy. Session databases are like that. But I'm talking persistent databases. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Is Diskeeper Automatic Mode safe?
Robert Schnabel wrote: Nope. Forgive my ignorance but isn't that what a UPS is for anyway? Along with a BBU controller. If you have a UPS *and* a BBU controller, then things are much better--those should have a write cache that insulates you from the worst of the problems. But just a UPS alone doesn't help you very much: 1) A UPS is built with a consumable (the battery), and they do wear out. Unless you're proactive about monitoring UPS battery quality and doing tests, you won't find this out until the first time the power goes out and the UPS doesn't work anymore. 2) Do you trust that the UPS integration software will *always* shut the server down before the power goes out? You shouldn't. 3) Ever had someone trip over the cord between the UPS and the server? How about accidentally unplugging the wrong server? These things happen; do you want data corruption when they do? 4) There are all sorts of major electrical problems you can run into (around here it's mainly summer lightening) that will blow out a UPS without giving an opportunity for graceful shutdown. If there's anyone who thinks a UPS is all you need to be safe from power issues, I know a guy named Murphy you should get introduced to. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com
Re: [PERFORM] Is Diskeeper Automatic Mode safe?
Greg Smith wrote: Robert Schnabel wrote: Nope. Forgive my ignorance but isn't that what a UPS is for anyway? Along with a BBU controller. If you have a UPS *and* a BBU controller, then things are much better--those should have a write cache that insulates you from the worst of the problems. But just a UPS alone doesn't help you very much: A UPS is just a controlled shutdown device for when things are working and mains power goes off. Note the when things are working qualifier. :) -- Karl attachment: karl.vcf -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Is Diskeeper Automatic Mode safe?
My reply about server failure was shwoing what could go wrong at the server level assuming a first-class, properly run data center, with fully redundant power, including a server with dual power supplies on separate cords fed by separate UPS'es etc. Unfortunately, *correctly* configured A/B power is all too rare these days. Some examples of foo that I've seen at professional data centers: - Allegedly A/B power supplied from two phases of the same UPS (which was then taken down due to a tech's error during hot maintenance) - A/B power fed through a common switch panel - A/B power with dual attached servers, with each power feed running a steady 60% load (do the math!) A classic piece of foo from a manufacturer - Dell supplies their low end dual-power rackmount boxes with a Y shaped IEC cable ... clearly, this is only suitable for non-redundant use but I've seen plenty of them deployed in data centers by less-than-clueful admins. On Mon, Nov 16, 2009 at 2:12 PM, Scott Marlowe scott.marl...@gmail.com wrote: On Mon, Nov 16, 2009 at 1:04 PM, Robert Schnabel schnab...@missouri.edu wrote: So the short answer is yes, I have it running with PostgreSQL and have not had any problems. Have you unplugged the power cord a few times in the middle of heavy write activity? ...Robert Nope. Forgive my ignorance but isn't that what a UPS is for anyway? Along with a BBU controller. BBU controller, yes. UPS, no. I've seen more than one multi-million dollar hosting center go down from something as simple as a piece of wire flying into a power conditioner, shorting it out, and feeding back and blowing every single power conditioner and UPS AND the switch that allowed the diesel to come into the loop. All failed. Every machine lost power. One database server out of a few dozens came back up. In fact there were a lot of different dbm systems running in that center, and only the pg 7.2 version came back up unscathed. Because someone insisted on pulling the plug out from the back a dozen or so times to make sure it would do come back up. PG saved our shorts and the asses they contain. Sad thing is I'm sure the other servers COULD have come back up if they had been running proper BBUs and hard drives that didn't lie about fsync, and an OS that enforced fsync properly, at least for scsi, at the time. Power supplies / UPSes fail far more often than one might think. And a db that doesn't come back up afterwards is not to be placed into production. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Is Diskeeper Automatic Mode safe?
Scott Marlowe wrote: On Mon, Nov 16, 2009 at 1:04 PM, Robert Schnabel schnab...@missouri.edu wrote: So the short answer is yes, I have it running with PostgreSQL and have not had any problems. Have you unplugged the power cord a few times in the middle of heavy write activity? ...Robert Nope. Forgive my ignorance but isn't that what a UPS is for anyway? Along with a BBU controller. BBU controller, yes. UPS, no. I've seen more than one multi-million dollar hosting center go down from something as simple as a piece of wire flying into a power conditioner, shorting it out, and feeding back and blowing every single power conditioner and UPS AND the switch that allowed the diesel to come into the loop. All failed. Every machine lost power. One database server out of a few dozens came back up. In fact there were a lot of different dbm systems running in that center, and only the pg 7.2 version came back up unscathed. Because someone insisted on pulling the plug out from the back a dozen or so times to make sure it would do come back up. PG saved our shorts and the asses they contain. Sad thing is I'm sure the other servers COULD have come back up if they had been running proper BBUs and hard drives that didn't lie about fsync, and an OS that enforced fsync properly, at least for scsi, at the time. Power supplies / UPSes fail far more often than one might think. And a db that doesn't come back up afterwards is not to be placed into production. Ok, so you have sufficiently sparked my curiosity as to whether Diskeeper will in any way cause Postgres to fail the power chord test. Unfortunately I have some deadlines to meet so won't be able to test this out until later in the week. I'm in the fortunate position that the only person that uses my db is me myself and I so I can control what and when it does work. I also have backup software running that does complete drive imaging so I should be able to do this fairly safely. Here is the plan... 1) Shut down the Diskeeper service, run a query that is write heavy and then pull the chord on the box. Wait a few minutes then plug it back in and see if it recovers. 2) Leave Diskeeper running and repeat the above... Comments/suggestions? If I'm going to do this I'd like to make sure I do it correctly so it will be useful for the group. I'm using XP 64 bit, Adaptec 52445 + BBU, I have two external drive enclosures (8 each) plus the 8 in the box, pg 8.4.0 Bob
Re: [PERFORM] Is Diskeeper Automatic Mode safe?
Dave Crooke wrote: My reply about server failure was shwoing what could go wrong at the server level assuming a first-class, properly run data center, with fully redundant power, including a server with dual power supplies on separate cords fed by separate UPS'es etc. Never had a motherboard short out either eh? China makes really GOOD electrolytic caps these days (I can show you several SERVER CLASS boards that were on conditioned power and popped 'em, rendering the board dead instantly.) Murphy is a bastard. -- Karl attachment: karl.vcf -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Unexpected sequential scan on an indexed column
Yeah this kind of thing would probably work. Doing this in java with separate queries would be easy to code but require multiple round trips. Doing it as a stored procedure would be nicer but I'd have to think a little more about how to refactor the java code around the query to make this happen. Thanks for the suggestion. Eddy On Mon, Nov 16, 2009 at 9:44 AM, Dave Crooke dcro...@gmail.com wrote: Hi Eddy Perhaps a slightly naive suggestion have you considered converting the query to a small stored procedure ('function' in Postgres speak)? You can pull the location values, and then iterate over a query like this: select userid from users where location=:x which is more-or-less guaranteed to use the index. I had a somewhat similar situation recently, where I was passing in a list of id's (from outwith Postgres) and it would on occasion avoid the index in favour of a full table scan I changed this to iterate over the id's with separate queries (in Java, but using a function will achieve the same thing) and went from one 5 minute query doing full table scan to a handful of queries doing sub-millisecond direct index lookups. Cheers Dave
Re: [PERFORM] Is Diskeeper Automatic Mode safe?
On Mon, Nov 16, 2009 at 1:32 PM, Robert Schnabel schnab...@missouri.edu wrote: Ok, so you have sufficiently sparked my curiosity as to whether Diskeeper will in any way cause Postgres to fail the power chord test. Unfortunately I have some deadlines to meet so won't be able to test this out until later Best time is during acceptance testing before deployment.Failing that testing it in production on the backup server so you can burn it to the ground and rebuild it on a saturday. Note that surviving the power plug being pulled doesn't PROVE your system will always do that. You can try to simulate the real mix of load and even replay queries when pulling the plug, only to find the one corner case you didnt' test in production when power is lost. The power cord plug can prove a system bad, but you're still somewhat hoping it's really good, with a high probability of being right. Which is why backup is so important. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Is Diskeeper Automatic Mode safe?
On Mon, Nov 16, 2009 at 1:32 PM, Karl Denninger k...@denninger.net wrote: Dave Crooke wrote: My reply about server failure was shwoing what could go wrong at the server level assuming a first-class, properly run data center, with fully redundant power, including a server with dual power supplies on separate cords fed by separate UPS'es etc. Never had a motherboard short out either eh? China makes really GOOD electrolytic caps these days (I can show you several SERVER CLASS boards that were on conditioned power and popped 'em, rendering the board dead instantly.) Murphy is a bastard. You know about the whole capacitor caper from a few years back, where this one plant was making corrosive electrolyte and a huge number of capacitor suppliers were buying from them. Mobos from that era are terrible. Caps that expand and burst after anywhere from a few months to a few years of use. ugh. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Unexpected sequential scan on an indexed column
On Mon, Nov 16, 2009 at 12:45:46PM -0800, Eddy Escardo-Raffo wrote: Yeah this kind of thing would probably work. Doing this in java with separate queries would be easy to code but require multiple round trips. Doing it as a stored procedure would be nicer but I'd have to think a little more about how to refactor the java code around the query to make this happen. Thanks for the suggestion. Eddy Hi Eddy, Here is a lookup wrapper that is used in DSPAM to work around a similar problem. Maybe you can use it as a template for your function: create function lookup_tokens(integer,bigint[]) returns setof dspam_token_data language plpgsql stable as ' declare v_rec record; begin for v_rec in select * from dspam_token_data where uid=$1 and token in (select $2[i] from generate_series(array_lower($2,1),array_upper($2,1)) s(i)) loop return next v_rec; end loop; return; end;'; Regards, Ken On Mon, Nov 16, 2009 at 9:44 AM, Dave Crooke dcro...@gmail.com wrote: Hi Eddy Perhaps a slightly naive suggestion have you considered converting the query to a small stored procedure ('function' in Postgres speak)? You can pull the location values, and then iterate over a query like this: select userid from users where location=:x which is more-or-less guaranteed to use the index. I had a somewhat similar situation recently, where I was passing in a list of id's (from outwith Postgres) and it would on occasion avoid the index in favour of a full table scan I changed this to iterate over the id's with separate queries (in Java, but using a function will achieve the same thing) and went from one 5 minute query doing full table scan to a handful of queries doing sub-millisecond direct index lookups. Cheers Dave -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Is Diskeeper Automatic Mode safe?
Scott Marlowe wrote: On Mon, Nov 16, 2009 at 1:32 PM, Robert Schnabel schnab...@missouri.edu wrote: Ok, so you have sufficiently sparked my curiosity as to whether Diskeeper will in any way cause Postgres to fail the power chord test. Unfortunately I have some deadlines to meet so won't be able to test this out until later Best time is during acceptance testing before deployment.Failing that testing it in production on the backup server so you can burn it to the ground and rebuild it on a saturday. Note that surviving the power plug being pulled doesn't PROVE your system will always do that. You can try to simulate the real mix of load and even replay queries when pulling the plug, only to find the one corner case you didnt' test in production when power is lost. The power cord plug can prove a system bad, but you're still somewhat "hoping" it's really good, with a high probability of being right. Which is why backup is so important. Granted, but the point of me testing this is to say whether or not the Diskeeper service could introduce a problem. If the system recovers without Diskeeper running but does not recover while Diskeeper is actively running then we have a problem. If they both recover then I've answered the question "Have you unplugged the power cord a few times in the middle of heavy write activity?" I understand that we can't prove that it works but I should be able to at least answer the question asked. I wouldn't consider my database a production one. I basically use it to store a large amount of genetic data for my lab. The only time the database gets use is when I use it. Short of frying a piece of hardware by pulling the plug I'm not worried about losing any data and rebuilding is actually quite a simple process that only takes about 2 hours... been there done that when I pulled the wrong SAS connector. Bob
Re: [PERFORM] Is Diskeeper Automatic Mode safe?
I also have backup software running that does complete drive imaging so I should be able to do this fairly safely. Here is the plan... 1) Shut down the Diskeeper service, run a query that is write heavy and then pull the chord on the box. Wait a few minutes then plug it back in and see if it recovers. 2) Leave Diskeeper running and repeat the above... Comments/suggestions? If I'm going to do this I'd like to make sure I do it correctly so it will be useful for the group. Do it more than once. This is a highly erratic test that can catch your system at a wide variety of points, some of which cause no problems, and some of which can be catastrophic. If you test and it fails, you know you have a problem. If you test and it doesn't fail, you don't know much. It's only when you've tested a number of times without failure that you've gained any real knowledge. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Is Diskeeper Automatic Mode safe?
Craig James escribió: Do it more than once. This is a highly erratic test that can catch your system at a wide variety of points, some of which cause no problems, and some of which can be catastrophic. If you test and it fails, you know you have a problem. If you test and it doesn't fail, you don't know much. It's only when you've tested a number of times without failure that you've gained any real knowledge. Of course, you're only truly safe when you've tested infinite times, which may take a bit longer than management expects. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Is Diskeeper Automatic Mode safe?
On Mon, Nov 16, 2009 at 03:20:12PM -0500, Greg Smith wrote: Robert Schnabel wrote: Nope. Forgive my ignorance but isn't that what a UPS is for anyway? Along with a BBU controller. If you have a UPS *and* a BBU controller, then things are much better--those should have a write cache that insulates you from the worst of the problems. But just a UPS alone doesn't help you very much: 1) A UPS is built with a consumable (the battery), and they do wear out. Unless you're proactive about monitoring UPS battery quality and doing tests, you won't find this out until the first time the power goes out and the UPS doesn't work anymore. Well the bbu is just another battery (ok some are capacitors but...) so the same caveats apply for a bbu raid card. We test ours every 6 months and fail them if they are less than a 5 day capacity (failure over a long weekend 3 days + 1-2 day(s) to fix the issue (replace power supply, mobo etc.)). -- -- rouilj John Rouillard System Administrator Renesys Corporation 603-244-9084 (cell) 603-643-9300 x 111 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Unexpected sequential scan on an indexed column
This is incredibly helpful, Kenneth. I didn't know about the SETOF syntax at all. This could help minimize the amount of refactoring I need to do. Thanks! Eddy On Mon, Nov 16, 2009 at 12:55 PM, Kenneth Marshall k...@rice.edu wrote: On Mon, Nov 16, 2009 at 12:45:46PM -0800, Eddy Escardo-Raffo wrote: Yeah this kind of thing would probably work. Doing this in java with separate queries would be easy to code but require multiple round trips. Doing it as a stored procedure would be nicer but I'd have to think a little more about how to refactor the java code around the query to make this happen. Thanks for the suggestion. Eddy Hi Eddy, Here is a lookup wrapper that is used in DSPAM to work around a similar problem. Maybe you can use it as a template for your function: create function lookup_tokens(integer,bigint[]) returns setof dspam_token_data language plpgsql stable as ' declare v_rec record; begin for v_rec in select * from dspam_token_data where uid=$1 and token in (select $2[i] from generate_series(array_lower($2,1),array_upper($2,1)) s(i)) loop return next v_rec; end loop; return; end;'; Regards, Ken On Mon, Nov 16, 2009 at 9:44 AM, Dave Crooke dcro...@gmail.com wrote: Hi Eddy Perhaps a slightly naive suggestion have you considered converting the query to a small stored procedure ('function' in Postgres speak)? You can pull the location values, and then iterate over a query like this: select userid from users where location=:x which is more-or-less guaranteed to use the index. I had a somewhat similar situation recently, where I was passing in a list of id's (from outwith Postgres) and it would on occasion avoid the index in favour of a full table scan I changed this to iterate over the id's with separate queries (in Java, but using a function will achieve the same thing) and went from one 5 minute query doing full table scan to a handful of queries doing sub-millisecond direct index lookups. Cheers Dave
Re: [PERFORM] Unexpected sequential scan on an indexed column
With Postgres, you can transparently replace a regular select with a function that takes the same types and returns a record iterator with the same columns. The only change needed is the SQL used to invoke it, you won't need any logic changes in your app code (Java or whatever), e.g. *select where x=:x ..(select .. where . y=:y) * Becomes *select myfunction(:x, :y) * On Mon, Nov 16, 2009 at 2:45 PM, Eddy Escardo-Raffo eesca...@kikini.comwrote: Yeah this kind of thing would probably work. Doing this in java with separate queries would be easy to code but require multiple round trips. Doing it as a stored procedure would be nicer but I'd have to think a little more about how to refactor the java code around the query to make this happen. Thanks for the suggestion. Eddy On Mon, Nov 16, 2009 at 9:44 AM, Dave Crooke dcro...@gmail.com wrote: Hi Eddy Perhaps a slightly naive suggestion have you considered converting the query to a small stored procedure ('function' in Postgres speak)? You can pull the location values, and then iterate over a query like this: select userid from users where location=:x which is more-or-less guaranteed to use the index. I had a somewhat similar situation recently, where I was passing in a list of id's (from outwith Postgres) and it would on occasion avoid the index in favour of a full table scan I changed this to iterate over the id's with separate queries (in Java, but using a function will achieve the same thing) and went from one 5 minute query doing full table scan to a handful of queries doing sub-millisecond direct index lookups. Cheers Dave
Re: [PERFORM] Unexpected sequential scan on an indexed column
Thanks, Dave. Eddy On Mon, Nov 16, 2009 at 1:52 PM, Dave Crooke dcro...@gmail.com wrote: With Postgres, you can transparently replace a regular select with a function that takes the same types and returns a record iterator with the same columns. The only change needed is the SQL used to invoke it, you won't need any logic changes in your app code (Java or whatever), e.g. *select where x=:x ..(select .. where . y=:y) * Becomes *select myfunction(:x, :y) * On Mon, Nov 16, 2009 at 2:45 PM, Eddy Escardo-Raffo eesca...@kikini.comwrote: Yeah this kind of thing would probably work. Doing this in java with separate queries would be easy to code but require multiple round trips. Doing it as a stored procedure would be nicer but I'd have to think a little more about how to refactor the java code around the query to make this happen. Thanks for the suggestion. Eddy On Mon, Nov 16, 2009 at 9:44 AM, Dave Crooke dcro...@gmail.com wrote: Hi Eddy Perhaps a slightly naive suggestion have you considered converting the query to a small stored procedure ('function' in Postgres speak)? You can pull the location values, and then iterate over a query like this: select userid from users where location=:x which is more-or-less guaranteed to use the index. I had a somewhat similar situation recently, where I was passing in a list of id's (from outwith Postgres) and it would on occasion avoid the index in favour of a full table scan I changed this to iterate over the id's with separate queries (in Java, but using a function will achieve the same thing) and went from one 5 minute query doing full table scan to a handful of queries doing sub-millisecond direct index lookups. Cheers Dave
Re: [PERFORM] Is Diskeeper Automatic Mode safe?
On Mon, Nov 16, 2009 at 2:04 PM, Robert Schnabel schnab...@missouri.edu wrote: Granted, but the point of me testing this is to say whether or not the Diskeeper service could introduce a problem. If the system recovers without Diskeeper running but does not recover while Diskeeper is actively running then we have a problem. If they both recover then I've answered the question Have you unplugged the power cord a few times in the middle of heavy write activity? I understand that we can't prove that it works but I should be able to at least answer the question asked. I wouldn't consider my database a production one. I basically use it to store a large amount of genetic data for my lab. The only time the database gets use is when I use it. Short of frying a piece of hardware by pulling the plug I'm not worried about losing any data and rebuilding is actually quite a simple process that only takes about 2 hours... been there done that when I pulled the wrong SAS connector. Be careful, it's not uncommon for a database / app to suddenly become popular and people start expecting it to be up all the time. For things like a corporate intranet, losing a few hours work from something like a power loss is acceptable. We have four types of dbs where I work. session servers can be configured to have fsync off and don't have to be ultra reliable under things like power loss. Search database which gets recreated every few days as the indexer runs. Stats database where reliability is sorta important but not life or death, and the user data database which has to work and stay up. So each one is tested differently because each one would have a much different impact if they crash and can't come back up without help. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Is Diskeeper Automatic Mode safe?
On Nov 16, 2009, at 1:09 PM, Robert Haas wrote: I'm not sure what the answer is to your actual question, but I'd highly recommend upgrading to 8.3 or 8.4. The performance is likely to be a lot better, and 8.0/8.1 are no longer supported on Windows. Ugh, yeah, I'd love to upgrade but the powers that get to make that decision have no interest in upgrading. So I'm stuck on 8.0.4, and since I really don't do the PG support itself, I don't even get to voice much of an opinion (I deal really just with making sure the physical hardware is doing what it needs to do, which is where the disk defrag comes in to play). -chris www.mythtech.net -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Is Diskeeper Automatic Mode safe?
On Nov 16, 2009, at 1:11 PM, Robert Schnabel wrote: I've been a Diskeeper customer for about 10 years now and consider it 'must have' software for Windows machines. snip So the short answer is yes, I have it running with PostgreSQL and have not had any problems. So that seems to be a definite vote for it should be just fine. I've read the other posts and I understand the concerns that were raised. I may try to do some testing myself since other than the one Yes there isn't anyone else jumping in to say they are doing it safely. Of course there is also no one saying don't do it, just statements of caution as it appears to be an unknown and has the potential to cause problems. It looks like to be really safe I should do some failure testing on my end first. Thanks to everyone for their input! -chris www.mythtech.net -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Is Diskeeper Automatic Mode safe?
cb c...@mythtech.net writes: Ugh, yeah, I'd love to upgrade but the powers that get to make that decision have no interest in upgrading. So I'm stuck on 8.0.4, Make sure you're not in the line of fire when (not if) that version eats your data. Particularly on Windows, insisting on not upgrading that version is unbelievably, irresponsibly stupid. There are a *large* number of known bugs. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Is Diskeeper Automatic Mode safe?
Tom Lane wrote: cb c...@mythtech.net writes: Ugh, yeah, I'd love to upgrade but the powers that get to make that decision have no interest in upgrading. So I'm stuck on 8.0.4, Make sure you're not in the line of fire when (not if) that version eats your data. Particularly on Windows, insisting on not upgrading that version is unbelievably, irresponsibly stupid. There are a *large* number of known bugs. Yeah, the prudent thing to do in your situation is to issue a CYA memo that says something like I think the hardware is OK, but due to large number of bugs in PostgreSQL 8.0.4 on Windows it's easy for the database to become corrupted anyway, point toward http://www.postgresql.org/docs/8.4/static/release.html to support that claim and note that 8.0.22 is the absolutely minimum version anyone should be running, then CC everyone up the management chain. You're using a version that considers your data quite tasty and would like to make a snack of it at the first opportunity that arises. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com
Re: [PERFORM] Is Diskeeper Automatic Mode safe?
On Mon, Nov 16, 2009 at 7:45 PM, Greg Smith g...@2ndquadrant.com wrote: Tom Lane wrote: cb c...@mythtech.net writes: Ugh, yeah, I'd love to upgrade but the powers that get to make that decision have no interest in upgrading. So I'm stuck on 8.0.4, Make sure you're not in the line of fire when (not if) that version eats your data. Particularly on Windows, insisting on not upgrading that version is unbelievably, irresponsibly stupid. There are a *large* number of known bugs. Yeah, the prudent thing to do in your situation is to issue a CYA memo that says something like I think the hardware is OK, but due to large number of bugs in PostgreSQL 8.0.4 on Windows it's easy for the database to become corrupted anyway, point toward http://www.postgresql.org/docs/8.4/static/release.html to support that claim and note that 8.0.22 is the absolutely minimum version anyone should be running, then CC everyone up the management chain. You're using a version that considers your data quite tasty and would like to make a snack of it at the first opportunity that arises. Last job I worked we had pgsql and a Big Commercial Database and the three other DBAs who worked on mostly that other database were scared to death of patches to their dbms. Thank the gods that pgsql updates are quite possibly the most reliable and easy to apply of any system. Read release notes, and 99% of the time it's just just shut down, rpm -Uvh postgres*rpm, start up, and viola you're up to date. Pg updates are focused on security and bug fixes that don't change accepted behaviour within a major version. I agree, not applying them verges on negligence. Especially if you haven't read the release notes to see what was fixed. Sometimes I read them and don't worry about it if it's a real esoteric bug. But when a data loss bug shows up I upgrade right away. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Is Diskeeper Automatic Mode safe?
On Nov 16, 2009, at 8:31 PM, Tom Lane wrote: Make sure you're not in the line of fire when (not if) that version eats your data. Particularly on Windows, insisting on not upgrading that version is unbelievably, irresponsibly stupid. There are a *large* number of known bugs. I hear ya, and have agreed with you for a long while. There is a fairly regular and constant fight in house over the issue of upgrading. We get hit on a regular basis with problems that as far as I know are bugs that have been fixed (transaction log rename crashes that take down PG, as well as queries just vanishing into the aether at times of heavy load resulting in hung threads in our Tomcat front end as it waits for something to come back that has disappeared). On Nov 16, 2009, at 9:45 PM, Greg Smith wrote: Yeah, the prudent thing to do in your situation is to issue a CYA memo that says something like I think the hardware is OK, but due to large number of bugs in PostgreSQL 8.0.4 on Windows it's easy for the database to become corrupted anyway, point toward http://www.postgresql.org/docs/8.4/static/release.html to support that claim and note that 8.0.22 is the absolutely minimum version anyone should be running, then CC everyone up the management chain. You're using a version that considers your data quite tasty and would like to make a snack of it at the first opportunity that arises. Myself and the other guy responsible for the underlying hardware have already gone down this route. The big bosses know our stance and know it isn't us preventing the upgrade. After that, there isn't too much more I can do except sit back and shake my head each time something goes wrong and I get sent on a wild goose chase to find any reason for the failure OTHER than PG. Really it comes down to the DBMs have a firm stance of nothing changes, ever. Me, I say bug fixes are released for a reason. My understanding is, before I joined the company, they did an upgrade from 7 on Linux to 8 on Windows and got bit by some change in PG that broke a bunch of code. After that, they have just refused to budge from the 8.0.4 version we are on and know the code works against. I don't really have any details beyond that and asking for them tends to invoke religious wars in house between the Linux/Open Source people and the Windows/Buy Everything people. So I've given up fighting, covered my butt, and just do the best I can to keep things running. Thanks again for the insights! -chris www.mythtech.net -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance