Re: [PERFORM] Query only slow on first run
You're essentially asking for a random sample of data that is not currently in memory. You're not going to get that without some I/O. No, that sounds reasonable enough. But do you agree with the statement that my query will just get slower and slower over time as the number of posts increases while the part having status = 1 is constant? (Therefore, as the relevant fraction becomes smaller over time, the Filter: status = 1 operation becomes slower.) ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] TB-sized databases
Simon Riggs [EMAIL PROTECTED] writes: On Wed, 2007-11-28 at 14:48 +0100, Csaba Nagy wrote: In fact an even more useful option would be to ask the planner to throw error if the expected cost exceeds a certain threshold... Well, I've suggested it before: statement_cost_limit on pgsql-hackers, 1 March 2006 Would people like me to re-write and resubmit this patch for 8.4? Tom's previous concerns were along the lines of How would know what to set it to?, given that the planner costs are mostly arbitrary numbers. Hm, that's only kind of true. Since 8.mumble seq_page_cost is itself configurable meaning you can adjust the base unit and calibrate all the parameters to be time in whatever unit you choose. But even assuming you haven't so adjusted seq_page_cost and all the other parameters to match the numbers aren't entirely arbitrary. They represent time in units of however long a single sequential page read takes. Obviously few people know how long such a page read takes but surely you would just run a few sequential reads of large tables and set the limit to some multiple of whatever you find. This isn't going to precise to the level of being able to avoid executing any query which will take over 1000ms. But it is going to be able to catch unconstrained cross joins or large sequential scans or such. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services! ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] 7.4 Checkpoint Question
I have a legacy system still on 7.4 (I know, I know...the upgrade is coming soon). I have a fairly big spike happening once a day, every day, at the same time. It happens during a checkpoint, no surprise there. I know the solution to the problem (upgrade to a modern version), but what I'm looking for as an explanation as to why one particular checkpoint would be so bad on a low volume system, so I can appease certain management concerns. This is a _really _low volume system, less than 500 writes/hour. Normal operation sees checkpoint related spikes of around 200-300 milliseconds. We always checkpoint at the checkpoint timeout (every 5 minutes). During this one checkpoint, I'm seeing transactions running 2-3 seconds. During this time, writes are 5/minute. Relevant settings: shared_buffers = 1 checkpoint_segments = 30 checkpoint_timeout = 300 What gives? -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Query only slow on first run
cluster [EMAIL PROTECTED] writes: You're essentially asking for a random sample of data that is not currently in memory. You're not going to get that without some I/O. No, that sounds reasonable enough. But do you agree with the statement that my query will just get slower and slower over time as the number of posts increases while the part having status = 1 is constant? No, not as long as it sticks to that plan. The time's basically determined by the number of aggregate rows the LIMIT asks for, times the average number of post rows per aggregate group. And as far as you said the latter number is not going to increase. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] TB-sized databases
Gregory Stark [EMAIL PROTECTED] writes: Simon Riggs [EMAIL PROTECTED] writes: Tom's previous concerns were along the lines of How would know what to set it to?, given that the planner costs are mostly arbitrary numbers. Hm, that's only kind of true. The units are not the problem. The problem is that you are staking non-failure of your application on the planner's estimates being pretty well in line with reality. Not merely in line enough that it picks a reasonably cheap plan, but in line enough that if it thinks plan A is 10x more expensive than plan B, then the actual ratio is indeed somewhere near 10. Given that this list spends all day every day discussing cases where the planner is wrong, I'd have to think that that's a bet I wouldn't take. You could probably avoid this risk by setting the cutoff at something like 100 or 1000 times what you really want to tolerate, but how useful is it then? regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] 7.4 Checkpoint Question
On Thu, Nov 29, 2007 at 10:10:54AM -0500, Brad Nicholson wrote: This is a _really _low volume system, less than 500 writes/hour. Normal operation sees checkpoint related spikes of around 200-300 milliseconds. We always checkpoint at the checkpoint timeout (every 5 minutes). During this one checkpoint, I'm seeing transactions running 2-3 seconds. During this time, writes are 5/minute. What gives? pg_dump? Remember that it has special locks approximately equivalent (actually eq? I forget) with SERIALIZABLE mode, which makes things rather different. A -- Andrew Sullivan Old sigs will return after re-constitution of blue smoke ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] 7.4 Checkpoint Question
On Thu, 2007-11-29 at 10:10 -0500, Brad Nicholson wrote: I have a legacy system still on 7.4 (I know, I know...the upgrade is coming soon). I have a fairly big spike happening once a day, every day, at the same time. It happens during a checkpoint, no surprise there. I know the solution to the problem (upgrade to a modern version), but what I'm looking for as an explanation as to why one particular checkpoint would be so bad on a low volume system, so I can appease certain management concerns. This is a _really _low volume system, less than 500 writes/hour. Normal operation sees checkpoint related spikes of around 200-300 milliseconds. We always checkpoint at the checkpoint timeout (every 5 minutes). During this one checkpoint, I'm seeing transactions running 2-3 seconds. During this time, writes are 5/minute. Relevant settings: shared_buffers = 1 checkpoint_segments = 30 checkpoint_timeout = 300 What gives? If the timing is regular, its most likely a human-initiated action rather then a behavioural characteristic. VACUUM runs in background at that time, updates loads of blocks which need to be written out at checkpoint time. That slows queries down at that time but not others. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] TB-sized databases
On Thu, 2007-11-29 at 10:45 -0500, Tom Lane wrote: Gregory Stark [EMAIL PROTECTED] writes: Simon Riggs [EMAIL PROTECTED] writes: Tom's previous concerns were along the lines of How would know what to set it to?, given that the planner costs are mostly arbitrary numbers. Hm, that's only kind of true. The units are not the problem. The problem is that you are staking non-failure of your application on the planner's estimates being pretty well in line with reality. Not merely in line enough that it picks a reasonably cheap plan, but in line enough that if it thinks plan A is 10x more expensive than plan B, then the actual ratio is indeed somewhere near 10. Given that this list spends all day every day discussing cases where the planner is wrong, I'd have to think that that's a bet I wouldn't take. I think you have a point, but the alternative is often much worse. If an SQL statement fails because of too high cost, we can investigate the problem and re-submit. If a website slows down because somebody allowed a very large query to execute then everybody is affected, not just the person who ran the bad query. Either way the guy that ran the query loses, but without constraints in place one guy can kill everybody else also. You could probably avoid this risk by setting the cutoff at something like 100 or 1000 times what you really want to tolerate, but how useful is it then? Still fairly useful, as long as we understand its a blunt instrument. If the whole performance of your system depends upon indexed access then rogue queries can have disastrous, unpredictable consequences. Many sites construct their SQL dynamically, so a mistake in a seldom used code path can allow killer queries through. Even the best DBAs have been known to make mistakes. e.g. An 80GB table has 8 million blocks in it. - So putting a statement_cost limit = 1 million would allow some fairly large queries but prevent anything that did a SeqScan (or worse). - Setting it 10 million is going to prevent things like sorting the whole table without a LIMIT - Setting it at 100 million is going to prevent unconstrained product joins etc.. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] 7.4 Checkpoint Question
On Thu, 2007-11-29 at 16:14 +, Simon Riggs wrote: On Thu, 2007-11-29 at 10:10 -0500, Brad Nicholson wrote: I have a legacy system still on 7.4 (I know, I know...the upgrade is coming soon). I have a fairly big spike happening once a day, every day, at the same time. It happens during a checkpoint, no surprise there. I know the solution to the problem (upgrade to a modern version), but what I'm looking for as an explanation as to why one particular checkpoint would be so bad on a low volume system, so I can appease certain management concerns. This is a _really _low volume system, less than 500 writes/hour. Normal operation sees checkpoint related spikes of around 200-300 milliseconds. We always checkpoint at the checkpoint timeout (every 5 minutes). During this one checkpoint, I'm seeing transactions running 2-3 seconds. During this time, writes are 5/minute. Relevant settings: shared_buffers = 1 checkpoint_segments = 30 checkpoint_timeout = 300 What gives? If the timing is regular, its most likely a human-initiated action rather then a behavioural characteristic. VACUUM runs in background at that time, updates loads of blocks which need to be written out at checkpoint time. That slows queries down at that time but not others. Bingo. Big vacuum daily vacuum completes shortly before this chckpoint. Thanks. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] TB-sized databases
* Simon Riggs ([EMAIL PROTECTED]) wrote: On Thu, 2007-11-29 at 10:45 -0500, Tom Lane wrote: Given that this list spends all day every day discussing cases where the planner is wrong, I'd have to think that that's a bet I wouldn't take. I think you have a point, but the alternative is often much worse. I'm not convinced you've outlined the consequences of implementing a plan cost limit sufficiently. If an SQL statement fails because of too high cost, we can investigate the problem and re-submit. If a website slows down because somebody allowed a very large query to execute then everybody is affected, not just the person who ran the bad query. Either way the guy that ran the query loses, but without constraints in place one guy can kill everybody else also. It's entirely possible (likely even) that most of the users accessing a webpage are using the same queries and the same tables. If the estimates for those tables ends up changing enough that PG adjusts the plan cost to be above the plan cost limit then *all* of the users would be affected. The plan cost isn't going to change for just one user if it's the same query that a bunch of users are using. I'm not sure if handling the true 'rougue query' case with this limit would actually be a net improvment overall in a website-based situation. I could see it being useful to set a 'notice_on_high_cost_query' variable where someone working in a data warehouse situation would get a notice if the query he's hand-crafting has a very high cost (in which case he could ctrl-c it if he thinks something is wrong, rather than waiting 5 hours before realizing he forgot a join clause), but the website with the one rougue query run by one user seems a stretch. Thanks, Stephen signature.asc Description: Digital signature
Re: [PERFORM] TB-sized databases
On Thu, 2007-11-29 at 10:45 -0500, Tom Lane wrote: Given that this list spends all day every day discussing cases where the planner is wrong, I'd have to think that that's a bet I wouldn't take. You could probably avoid this risk by setting the cutoff at something like 100 or 1000 times what you really want to tolerate, but how useful is it then? It would still be useful in the sense that if the planner is taking wrong estimates you must correct it somehow... raise statistics target, rewrite query or other tweaking, you should do something. An error is sometimes better than gradually decreasing performance because of too low statistics target for example. So if the error is thrown because of wrong estimate, it is still a valid error raising a signal that the DBA has to do something about it. It's still true that if the planner estimates too low, it will raise no error and will take the resources. But that's just what we have now, so it wouldn't be a regression of any kind... Cheers, Csaba. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] Configuring a Large RAM PostgreSQL Server
Does anyone have any white papers or basic guides for a large RAM server? We are consolidating two databases to enable better data-mining that currently run on a 4 GB and 2 GB machine. The data issues on the 4 GB machine are numerous, things like create index fail and update queries fail from out of memory issues. Re-factoring the data is helping, but isn't finishing the job. The new machine will have 48 GB of RAM, so figuring out starting points for the Shared Buffers and Work_mem/Maintenance_work_mem is going to be a crap shoot, since the defaults still seem to be based upon 256MB of RAM or less. Usage: Most of the time, the database is being hit with a handle of poorly written and unoptimized queries from a Ruby on Rails app that is being refactored as a simple Ruby-DBI app since we need to support our legacy code but don't need the Rails environment, just a lot of SQL writes. Some stored procedures should streamline this. However, each transaction will do about 5 or 6 writes. Common Usage: we have a reporting tool that is also being refactored, but does a lot of aggregate queries. None of these take more than 500 ms after indexing on the 2 GB database, so assuming that more RAM should help and eliminate the problems. Problem Usage: we have a 20GB table with 120m rows that we are splitting into some sub-tables. Generally, we do large data pulls from here, 1 million - 4 million records at a time, stored in a new table for export. These queries are problematic because we are unable to index the database for the queries that we run because we get out of memory errors. Most of my cleanup has restored to FOR-IN loops via pl-pgsql to manage the data one row at a time. This is problematic because many of these scripts are taking 4-5 days to run. Other usage: we will import between 10k and 10m rows at one time out of CSVs into the big database table. I got my gig here because this was all failing and the data was becoming worthless. These imports involve a lot of writes. Our simultaneous queries are small, and currently run acceptably. It's the big imports, data-mining pulls, and system manipulation were we routinely wait days on the query that we are looking to speed up. Thanks, Alex ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Configuring a Large RAM PostgreSQL Server
Alex Hochberger wrote: Does anyone have any white papers or basic guides for a large RAM server? We are consolidating two databases to enable better data-mining that currently run on a 4 GB and 2 GB machine. The data issues on the 4 GB machine are numerous, things like create index fail and update queries fail from out of memory issues. Problem Usage: we have a 20GB table with 120m rows that we are splitting into some sub-tables. Generally, we do large data pulls from here, 1 million - 4 million records at a time, stored in a new table for export. These queries are problematic because we are unable to index the database for the queries that we run because we get out of memory errors. Would it not make sense to find out why you are getting these errors first? It's not normal to get out of memory when rebuilding an index. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Configuring a Large RAM PostgreSQL Server
It's not on rebuilding the index, it's on CREATE INDEX. I attribute it to wrong setting, Ubuntu bizarre-ness, and general problems. We need new hardware, the servers are running on aging infrastructure, and we decided to get a new system that will last us the next 3-4 years all at once. But many large queries are getting Out of Memory errors. Alex On Nov 29, 2007, at 2:15 PM, Richard Huxton wrote: Alex Hochberger wrote: Does anyone have any white papers or basic guides for a large RAM server? We are consolidating two databases to enable better data-mining that currently run on a 4 GB and 2 GB machine. The data issues on the 4 GB machine are numerous, things like create index fail and update queries fail from out of memory issues. Problem Usage: we have a 20GB table with 120m rows that we are splitting into some sub-tables. Generally, we do large data pulls from here, 1 million - 4 million records at a time, stored in a new table for export. These queries are problematic because we are unable to index the database for the queries that we run because we get out of memory errors. Would it not make sense to find out why you are getting these errors first? It's not normal to get out of memory when rebuilding an index. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Configuring a Large RAM PostgreSQL Server
On Nov 29, 2007, at 2:15 PM, Richard Huxton wrote: Alex Hochberger wrote: Problem Usage: we have a 20GB table with 120m rows that we are splitting into some sub-tables. Generally, we do large data pulls from here, 1 million - 4 million records at a time, stored in a new table for export. These queries are problematic because we are unable to index the database for the queries that we run because we get out of memory errors. Would it not make sense to find out why you are getting these errors first? Alex Hochberger wrote: It's not on rebuilding the index, it's on CREATE INDEX. I attribute it to wrong setting, Ubuntu bizarre-ness, and general problems. Please do not top-post. I reformatted your message for clarity. Richard is still correct: it is not normal to get out-of-memory errors during index building, regardless of age of servers and Linux distro. Perhaps you just have a maintenance_work_mem setting that's too large for your server. -- Alvaro Herrerahttp://www.advogato.org/person/alvherre Uno puede defenderse de los ataques; contra los elogios se esta indefenso ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] GiST indexing tuples
On Thu, Nov 29, 2007 at 03:23:10PM -0500, Matthew T. O'Connor wrote: Sorry in advance if this is a stupid question, but how is this better than two index, one on a and one on b? I supposed there could be a space savings but beyond that? You could index on both columns simultaneously without a bitmap index scan. /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] GiST indexing tuples
Matthew wrote: For instance, the normal B-tree index on (a, b) is able to answer queries like a = 5 AND b 1 or a 5. An R-tree would be able to index these, plus queries like a 5 AND b 1. Sorry in advance if this is a stupid question, but how is this better than two index, one on a and one on b? I supposed there could be a space savings but beyond that? ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] clear pg_stats
Campbell, Lance wrote: How can I clear the pg_stats views without restarting PostgreSQL? I thought there was a function. SELECT pg_stat_reset(); -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] clear pg_stats
Campbell, Lance wrote: How can I clear the pg_stats views without restarting PostgreSQL? I thought there was a function. pg_stat_reset() //Magnus ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] clear pg_stats
How can I clear the pg_stats views without restarting PostgreSQL? I thought there was a function. Thanks, Lance Campbell Project Manager/Software Architect Web Services at Public Affairs University of Illinois 217.333.0382 http://webservices.uiuc.edu
Re: [PERFORM] TB-sized databases
Simon Riggs wrote: On Wed, 2007-11-28 at 14:48 +0100, Csaba Nagy wrote: In fact an even more useful option would be to ask the planner to throw error if the expected cost exceeds a certain threshold... Well, I've suggested it before: statement_cost_limit on pgsql-hackers, 1 March 2006 Would people like me to re-write and resubmit this patch for 8.4? Tom's previous concerns were along the lines of How would know what to set it to?, given that the planner costs are mostly arbitrary numbers. Any bright ideas, or is it we want it and we don't care about the possible difficulties? Knowing how to set it is a problem - but a possibly bigger one is that monster query crippling your DW system, so I'd say lets have it. Cheers Mark ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Configuring a Large RAM PostgreSQL Server
Alex, The new machine will have 48 GB of RAM, so figuring out starting points for the Shared Buffers and Work_mem/Maintenance_work_mem is going to be a crap shoot, since the defaults still seem to be based upon 256MB of RAM or less. Why a crap shoot? Set shared_buffers to 12GB. Set work_mem to 20GB / # of concurrent active connections (check your logs). Set Maint_mem to 2GB (I don't think we can actually use more). Then tune from there. Also, use 8.2 or later, and you'd better compile 64-bit. -- Josh Berkus PostgreSQL @ Sun San Francisco ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] TB-sized databases
Simon Riggs wrote: On Tue, 2007-11-27 at 18:06 -0500, Pablo Alcaraz wrote: Simon Riggs wrote: All of those responses have cooked up quite a few topics into one. Large databases might mean text warehouses, XML message stores, relational archives and fact-based business data warehouses. The main thing is that TB-sized databases are performance critical. So it all depends upon your workload really as to how well PostgreSQL, or another other RDBMS vendor can handle them. Anyway, my reason for replying to this thread is that I'm planning changes for PostgreSQL 8.4+ that will make allow us to get bigger and faster databases. If anybody has specific concerns then I'd like to hear them so I can consider those things in the planning stages it would be nice to do something with selects so we can recover a rowset on huge tables using a criteria with indexes without fall running a full scan. In my opinion, by definition, a huge database sooner or later will have tables far bigger than RAM available (same for their indexes). I think the queries need to be solved using indexes enough smart to be fast on disk. OK, I agree with this one. I'd thought that index-only plans were only for OLTP, but now I see they can also make a big difference with DW queries. So I'm very interested in this area now. If that's true, then you want to get behind the work Gokulakannan Somasundaram (http://archives.postgresql.org/pgsql-hackers/2007-10/msg00220.php) has done with relation to thick indexes. I would have thought that concept particularly useful in DW. Only having to scan indexes on a number of join tables would be a huge win for some of these types of queries. My tiny point of view would say that is a much better investment than setting up the proposed parameter. I can see the use of the parameter though. Most of the complaints about indexes having visibility is about update /delete contention. I would expect in a DW that those things aren't in the critical path like they are in many other applications. Especially with partitioning and previous partitions not getting may updates, I would think there could be great benefit. I would think that many of Pablo's requests up-thread would get significant performance benefit from this type of index. But as I mentioned at the start, that's my tiny point of view and I certainly don't have the resources to direct what gets looked at for PostgreSQL. Regards Russell Smith ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match