> On 06/28/2011 05:28 PM, Craig McIlwee wrote: > > Autovacuum is disabled for these tables since the data is never > > updated. The tables that we are testing with at the moment will not > > grow any larger and have been both clustered and analyzed. > > Note that any such prep to keep from ever needing to maintain these > tables in the future should include the FREEZE option, possibly with > some parameters tweaked first to make it more aggressive. Autovacuum > will eventually revisit them in order to prevent transaction ID > wrap-around, even if it's disabled. If you're going to the trouble of > prepping them so they are never touched again, you should do a freeze > with the right parameters to keep this from happening again. > > > work_mem: 512MB > > shared_buffers: 64MB, 512MB, and 1024MB, each yielded the same query > > plan and took the same amount of time to execute give or take a few > > seconds > > shared_buffers doesn't normally impact the query plan; it impacts how > much churn there is between the database and the operating system cache, > mainly important for making write-heavy work efficient. On Windows, > you'll probably be safe to set this to 512MB and forget about it. It > doesn't benefit from large values anyway.
I was thinking that shared buffers controlled the amount of data, primarily table and index pages, that the database could store in memory at once. Based on that assumption, I thought that a larger value would enable an entire table + index to be in memory together and speed up the query. Am I wrong? > > This is a very large work_mem setting however, so be careful that you > won't have many users connecting at once if you're going to use it. > Each connection can use a multiple of work_mem, making it quite possible > you could run out of memory with this configuration. If that low user > count is true, you may want to make sure you're enforcing it by lowering > max_connections, as a safety measure to prevent problems. I plan on lowering this quite a bit since I haven't seen much of a boost by increasing it. > > > Since the daily tables are only 360MB, I would hope that the entire > > table could be pulled into memory with one large sequential read. Of > > course this assumes that the file pieces are stored contiguously, but > > auto defrag is enabled and shows low fragmentation so Im trusting (as > > much as one can) Windows to do the right thing here. My drives have a > > 150MB/s sustained max throughput, and considering that data is spread > > across 5 drives I would hope to at least be able to reach the single > > disk theoretical limit and read an entire table plus the index into > > memory about 4 to 5 seconds. Based on the analyze output, each daily > > table averages 6 to 7 seconds, so Im pretty close there and maybe just > > limited by disk speed? > > One thing to note is that your drive speed varies based on what part of > the disk things are located at; the slower parts of the drive will be > much less than 150MB/s. > > On Linux servers it's impossible to reach something close to the disk's > raw speed without making the operating system read-ahead feature much > more aggressive than it is by default. Because PostgreSQL fetches a > single block at a time, to keep the drive completely busy something has > to notice the pattern of access and be reading data ahead of when the > database even asks for it. You may find a parameter you can tune in the > properties for the drives somewhere in the Windows Control Panel. And > there's a read-ahead setting on your PERC card that's better than > nothing you may not have turned on (not as good as the Linux one, but > it's useful). There are two useful settings there ("on" and "adaptive" > if I recall correctly) that you can try, to see which works better. Looks like they are set to adaptive read-ahead now. If the database is executing many concurrent queries, is it reasonable to suspect that the IO requests will compete with each other in such a way that the controller would rarely see many sequential requests since it is serving many processes? The controller does have an 'on' option also that forces read-ahead, maybe that would solve the issue if we can rely on the data to survive in the cache until the actual read request takes place. > > > Ive read documentation that says I should be able to set statistics > > values for an entire table as opposed to per column, but havent found > > how to do that. I guess I was either too lazy to update statistics on > > each column or just didnt think it would help much. > > You can adjust the statistics target across the entire database using > the default_statistics_target setting, or you can tweak them per column > using ALTER TABLE. There is no table-level control. I find it > difficult to answer questions about whether there is enough stats or not > without actually looking at pg_stats to see how the database is > interpreting the data, and comparing it against the real distribution. > This is an area where flailing about trying things doesn't work very > well; you need to be very systematic about the analysis and testing > strategy if you're going to get anywhere useful. It's not easy to do. > > As a larger commentary on what you're trying to do, applications like > this often find themselves at a point one day where you just can't allow > arbitrary user queries to run against them anymore. What normally > happens then is that the most common things that people really need end > up being run one and stored in some summary form, using techniques such > as materialized views: http://wiki.postgresql.org/wiki/Materialized_Views > > In your case, I would start now on trying to find the common patters to > the long running reports that people generate, and see if it's possible > to pre-compute some portion of them and save that summary. And you may > find yourself in a continuous battle with business requests regardless. > It's often key decision makers who feel they should be able to query any > way they want, regardless of its impact on the database. Finding a > middle ground there is usually challenging. > > -- > Greg Smith 2ndQuadrant US g...@2ndquadrant.com Baltimore, MD > Comprehensive and Customized PostgreSQL Training Classes: > http://www.2ndquadrant.us/postgresql-training/ > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance > Thanks, Craig This e-mail communication (including any attachments) may contain confidential and/or privileged material intended solely for the individual or entity to which it is addressed. P - Think before you print.