Re: [PERFORM] default_statistics_target

2010-03-25 Thread Robert Haas
On Mon, Mar 22, 2010 at 6:19 PM, Carlo Stonebanks
stonec.regis...@sympatico.ca wrote:
 Thanks for the insight. How much more of a server's resources will be
 consumed by an ANALYZE with default_statistics_target = 100?

I don't think it will be much of a problem, especially since
autovacuum will do only the tables that need it and not all the same
time.  But you can certainly try it.  Before changing the global
setting, try just changing it for one session with SET:

\timing
ANALYZE some table;
SET default_statistics_target = 100;
ANALYZE same table;
\q

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] default_statistics_target

2010-03-22 Thread Carlo Stonebanks

HI Greg,


Thanks for the insight. How much more of a server's resources will be 
consumed by an ANALYZE with default_statistics_target = 100?


We have two environments hosting the same data. One is our live server, 
which serves the web site, and this hosts our published data, not more than 
200 - 300 tables.


PRODUCTION: The data warehouse consisting of our published data, as well as 
our input resources which are transformed via ETL processes into our 
published data. It is these input resources which currently consist of 
about 8,000 tables and growing. Don't really require analysis, as they are 
typically run once in a linear read when importing.they are typically read 
linearly, and rarely more than once. They are kept for auditing and 
rollbacks.


LIVE: Hosts just the published data, copied over from the production server. 
Because the data does not get written to very often, older stats from 
ANALYZE are likely to still be valid. Our concern is that with the older 
setting of default_statistics_target = 10 it has not gone deep enough into 
these tables (numbering in the millios of rows) to really represent the data 
distribution properly.


Given that it looks like you're running 8.3 from past messages I've seen 
from you, I'd also be concerned that you've overrun your max_fsm_pages, so 
that VACUUM is growing increasing ineffective for you, and that's 
contributing to your headache.


Below are the config values of our production server (those not listed are 
those stubbed out) . Sadly, in an attempt to improve the server's 
performance, someone wiped out all of the changes I had made to date, along 
with comments indicating previous values, reason for the change, etc. What 
do they call that again? Oh, yeah. Documentation.


# CENTOS 5.4
# Linux mdx_octo 2.6.18-164.el5 #1 SMP Thu Sep 3 03:28:30 EDT 2009 x86_64 
x86_64 x86_64 GNU/Linux

# pgsql 8.3.10, 8 CPUs, 48GB RAM
# RAID 10, 4 Disks
autovacuum = on   # Enable autovacuum subprocess?  'on'
autovacuum_analyze_scale_factor = 0.05 # fraction of table size before 
analyze

autovacuum_analyze_threshold = 1000
autovacuum_naptime = 1min  # time between autovacuum runs
autovacuum_vacuum_cost_delay =  50 # default vacuum cost delay for
autovacuum_vacuum_scale_factor = 0.2 # fraction of table size before vacuum
autovacuum_vacuum_threshold = 1000
bgwriter_lru_maxpages = 100  # 0-1000 max buffers written/round
checkpoint_segments = 128 # in logfile segments, min 1, 16MB each
checkpoint_warning = 290s  # 0 is off
client_min_messages =  debug1 # values in order of decreasing detail:
datestyle = 'iso, mdy'
default_statistics_target = 250 # range 1-1000
default_text_search_config = 'pg_catalog.english'
lc_messages = 'C'   # locale for system error message
lc_monetary = 'C'   # locale for monetary formatting
lc_numeric = 'C'   # locale for number formatting
lc_time = 'C'# locale for time formatting
listen_addresses = '*'  # what IP address(es) to listen on;
log_destination = 'stderr'  # Valid values are combinations of
log_error_verbosity =  verbose  # terse, default, or verbose messages
log_line_prefix = '%t '   # special values:
log_min_error_statement =  debug1 # values in order of decreasing detail:
log_min_messages = debug1  # values in order of decreasing detail:
logging_collector = on  # Enable capturing of stderr and csvlog
maintenance_work_mem = 256MB
max_connections = 100   # (change requires restart)
max_fsm_relations = 1000  # min 100, ~70 bytes each
max_locks_per_transaction = 128  # min 10
port = 5432# (change requires restart)
shared_buffers = 4096MB
shared_preload_libraries = '$libdir/plugins/plugin_debugger.so'  # (change 
requires restart)

track_counts = on
vacuum_cost_delay = 5   # 0-1000 milliseconds
wal_buffers = 4MB
wal_sync_method = open_sync
work_mem = 64MB

Carlo


Greg Smith g...@2ndquadrant.com wrote in message 
news:4b9e33af.2020...@2ndquadrant.com...

Carlo Stonebanks wrote:
The whole topic of messing with stats makes my head spin but I am 
concerned about some horridly performing queries that have had bad rows 
estimates and others which always choose seq scans when indexes are 
available. Reading up on how to improve planner estimates, I have seen 
references to default_statistics_target being changed from the default of 
10 to 100.


Our DB is large, with thousands of tables


Stop right there for a second.  Are you sure autovacuum is working well 
here?  With thousands of tables, it wouldn't surprise me to discover your 
planner estimates are wrong because there hasn't been a recent enough 
ANALYZE on the relevant tables.  If you haven't already, take a look at 
pg_stat_user_tables and make sure that tables that have the bad estimates 
have actually been analyzed recently.  A look at the live/dead row counts 
there should be helpful as well.


If all that's recent, but you're still getting bad estimates, only then 
would I suggest trying an increase to default_statistics_target.  In the 
situation where 

Re: [PERFORM] default_statistics_target

2010-03-15 Thread Albe Laurenz
Carlo Stonebanks wrote:
 The whole topic of messing with stats makes my head spin but I am concerned 
 about some horridly performing queries that have had bad rows estimates and 
 others which always choose seq scans when indexes are available. Reading up 
 on how to improve planner estimates, I have seen references to 
 default_statistics_target being changed from the default of 10 to 100.
 
 Our DB is large, with thousands of tables, but the core schema has about 100 
 tables and the typical row counts are in the millions of rows for the whole 
 table. We have been playing endless games with tuning this server - but with 
 all of the suggestions, I don't think the issue of changing 
 default_statistics_target has ever come up. Realizing that there is a 
 performance hit associated with ANALYZE, are there any other downsides to 
 increasing this value to 100, and is this a common setting for large DBs?

From PostgreSQL 8.3 to 8.4, the default value for default_statistics_target
has changed from 10 to 100. I would take that as a very strong indication
that 100 is preceived to be a reasonable value by many knowlegdable people.

High values of that parameter are advisable if good performance of
nontrivial queries is the most important thing in your database
(like in a data warehouse) and the cost of ANALYZE is only secondary.

Yours,
Laurenz Albe

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] default_statistics_target

2010-03-15 Thread Greg Smith

Carlo Stonebanks wrote:
The whole topic of messing with stats makes my head spin but I am 
concerned about some horridly performing queries that have had bad 
rows estimates and others which always choose seq scans when indexes 
are available. Reading up on how to improve planner estimates, I have 
seen references to default_statistics_target being changed from the 
default of 10 to 100.


Our DB is large, with thousands of tables


Stop right there for a second.  Are you sure autovacuum is working well 
here?  With thousands of tables, it wouldn't surprise me to discover 
your planner estimates are wrong because there hasn't been a recent 
enough ANALYZE on the relevant tables.  If you haven't already, take a 
look at pg_stat_user_tables and make sure that tables that have the bad 
estimates have actually been analyzed recently.  A look at the live/dead 
row counts there should be helpful as well.


If all that's recent, but you're still getting bad estimates, only then 
would I suggest trying an increase to default_statistics_target.  In the 
situation where autovacuum isn't keeping up with some tables because you 
have thousands of them, increasing the stats target can actually make 
the problem worse, because the tables that are getting analyzed will 
take longer to process--more statistics work to be done per table.


Given that it looks like you're running 8.3 from past messages I've seen 
from you, I'd also be concerned that you've overrun your max_fsm_pages, 
so that VACUUM is growing increasing ineffective for you, and that's 
contributing to your headache.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] default_statistics_target

2010-03-14 Thread Carlo Stonebanks

Hi people,

The whole topic of messing with stats makes my head spin but I am concerned 
about some horridly performing queries that have had bad rows estimates and 
others which always choose seq scans when indexes are available. Reading up 
on how to improve planner estimates, I have seen references to 
default_statistics_target being changed from the default of 10 to 100.


Our DB is large, with thousands of tables, but the core schema has about 100 
tables and the typical row counts are in the millions of rows for the whole 
table. We have been playing endless games with tuning this server - but with 
all of the suggestions, I don't think the issue of changing 
default_statistics_target has ever come up. Realizing that there is a 
performance hit associated with ANALYZE, are there any other downsides to 
increasing this value to 100, and is this a common setting for large DBs?


Thanks,

Carlo 



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance