Re: [PERFORM] default_statistics_target
On Mon, Mar 22, 2010 at 6:19 PM, Carlo Stonebanks stonec.regis...@sympatico.ca wrote: Thanks for the insight. How much more of a server's resources will be consumed by an ANALYZE with default_statistics_target = 100? I don't think it will be much of a problem, especially since autovacuum will do only the tables that need it and not all the same time. But you can certainly try it. Before changing the global setting, try just changing it for one session with SET: \timing ANALYZE some table; SET default_statistics_target = 100; ANALYZE same table; \q ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] default_statistics_target
HI Greg, Thanks for the insight. How much more of a server's resources will be consumed by an ANALYZE with default_statistics_target = 100? We have two environments hosting the same data. One is our live server, which serves the web site, and this hosts our published data, not more than 200 - 300 tables. PRODUCTION: The data warehouse consisting of our published data, as well as our input resources which are transformed via ETL processes into our published data. It is these input resources which currently consist of about 8,000 tables and growing. Don't really require analysis, as they are typically run once in a linear read when importing.they are typically read linearly, and rarely more than once. They are kept for auditing and rollbacks. LIVE: Hosts just the published data, copied over from the production server. Because the data does not get written to very often, older stats from ANALYZE are likely to still be valid. Our concern is that with the older setting of default_statistics_target = 10 it has not gone deep enough into these tables (numbering in the millios of rows) to really represent the data distribution properly. Given that it looks like you're running 8.3 from past messages I've seen from you, I'd also be concerned that you've overrun your max_fsm_pages, so that VACUUM is growing increasing ineffective for you, and that's contributing to your headache. Below are the config values of our production server (those not listed are those stubbed out) . Sadly, in an attempt to improve the server's performance, someone wiped out all of the changes I had made to date, along with comments indicating previous values, reason for the change, etc. What do they call that again? Oh, yeah. Documentation. # CENTOS 5.4 # Linux mdx_octo 2.6.18-164.el5 #1 SMP Thu Sep 3 03:28:30 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux # pgsql 8.3.10, 8 CPUs, 48GB RAM # RAID 10, 4 Disks autovacuum = on # Enable autovacuum subprocess? 'on' autovacuum_analyze_scale_factor = 0.05 # fraction of table size before analyze autovacuum_analyze_threshold = 1000 autovacuum_naptime = 1min # time between autovacuum runs autovacuum_vacuum_cost_delay = 50 # default vacuum cost delay for autovacuum_vacuum_scale_factor = 0.2 # fraction of table size before vacuum autovacuum_vacuum_threshold = 1000 bgwriter_lru_maxpages = 100 # 0-1000 max buffers written/round checkpoint_segments = 128 # in logfile segments, min 1, 16MB each checkpoint_warning = 290s # 0 is off client_min_messages = debug1 # values in order of decreasing detail: datestyle = 'iso, mdy' default_statistics_target = 250 # range 1-1000 default_text_search_config = 'pg_catalog.english' lc_messages = 'C' # locale for system error message lc_monetary = 'C' # locale for monetary formatting lc_numeric = 'C' # locale for number formatting lc_time = 'C'# locale for time formatting listen_addresses = '*' # what IP address(es) to listen on; log_destination = 'stderr' # Valid values are combinations of log_error_verbosity = verbose # terse, default, or verbose messages log_line_prefix = '%t ' # special values: log_min_error_statement = debug1 # values in order of decreasing detail: log_min_messages = debug1 # values in order of decreasing detail: logging_collector = on # Enable capturing of stderr and csvlog maintenance_work_mem = 256MB max_connections = 100 # (change requires restart) max_fsm_relations = 1000 # min 100, ~70 bytes each max_locks_per_transaction = 128 # min 10 port = 5432# (change requires restart) shared_buffers = 4096MB shared_preload_libraries = '$libdir/plugins/plugin_debugger.so' # (change requires restart) track_counts = on vacuum_cost_delay = 5 # 0-1000 milliseconds wal_buffers = 4MB wal_sync_method = open_sync work_mem = 64MB Carlo Greg Smith g...@2ndquadrant.com wrote in message news:4b9e33af.2020...@2ndquadrant.com... Carlo Stonebanks wrote: The whole topic of messing with stats makes my head spin but I am concerned about some horridly performing queries that have had bad rows estimates and others which always choose seq scans when indexes are available. Reading up on how to improve planner estimates, I have seen references to default_statistics_target being changed from the default of 10 to 100. Our DB is large, with thousands of tables Stop right there for a second. Are you sure autovacuum is working well here? With thousands of tables, it wouldn't surprise me to discover your planner estimates are wrong because there hasn't been a recent enough ANALYZE on the relevant tables. If you haven't already, take a look at pg_stat_user_tables and make sure that tables that have the bad estimates have actually been analyzed recently. A look at the live/dead row counts there should be helpful as well. If all that's recent, but you're still getting bad estimates, only then would I suggest trying an increase to default_statistics_target. In the situation where
Re: [PERFORM] default_statistics_target
Carlo Stonebanks wrote: The whole topic of messing with stats makes my head spin but I am concerned about some horridly performing queries that have had bad rows estimates and others which always choose seq scans when indexes are available. Reading up on how to improve planner estimates, I have seen references to default_statistics_target being changed from the default of 10 to 100. Our DB is large, with thousands of tables, but the core schema has about 100 tables and the typical row counts are in the millions of rows for the whole table. We have been playing endless games with tuning this server - but with all of the suggestions, I don't think the issue of changing default_statistics_target has ever come up. Realizing that there is a performance hit associated with ANALYZE, are there any other downsides to increasing this value to 100, and is this a common setting for large DBs? From PostgreSQL 8.3 to 8.4, the default value for default_statistics_target has changed from 10 to 100. I would take that as a very strong indication that 100 is preceived to be a reasonable value by many knowlegdable people. High values of that parameter are advisable if good performance of nontrivial queries is the most important thing in your database (like in a data warehouse) and the cost of ANALYZE is only secondary. Yours, Laurenz Albe -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] default_statistics_target
Carlo Stonebanks wrote: The whole topic of messing with stats makes my head spin but I am concerned about some horridly performing queries that have had bad rows estimates and others which always choose seq scans when indexes are available. Reading up on how to improve planner estimates, I have seen references to default_statistics_target being changed from the default of 10 to 100. Our DB is large, with thousands of tables Stop right there for a second. Are you sure autovacuum is working well here? With thousands of tables, it wouldn't surprise me to discover your planner estimates are wrong because there hasn't been a recent enough ANALYZE on the relevant tables. If you haven't already, take a look at pg_stat_user_tables and make sure that tables that have the bad estimates have actually been analyzed recently. A look at the live/dead row counts there should be helpful as well. If all that's recent, but you're still getting bad estimates, only then would I suggest trying an increase to default_statistics_target. In the situation where autovacuum isn't keeping up with some tables because you have thousands of them, increasing the stats target can actually make the problem worse, because the tables that are getting analyzed will take longer to process--more statistics work to be done per table. Given that it looks like you're running 8.3 from past messages I've seen from you, I'd also be concerned that you've overrun your max_fsm_pages, so that VACUUM is growing increasing ineffective for you, and that's contributing to your headache. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.us -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] default_statistics_target
Hi people, The whole topic of messing with stats makes my head spin but I am concerned about some horridly performing queries that have had bad rows estimates and others which always choose seq scans when indexes are available. Reading up on how to improve planner estimates, I have seen references to default_statistics_target being changed from the default of 10 to 100. Our DB is large, with thousands of tables, but the core schema has about 100 tables and the typical row counts are in the millions of rows for the whole table. We have been playing endless games with tuning this server - but with all of the suggestions, I don't think the issue of changing default_statistics_target has ever come up. Realizing that there is a performance hit associated with ANALYZE, are there any other downsides to increasing this value to 100, and is this a common setting for large DBs? Thanks, Carlo -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance