Re: [PERFORM] shared_buffers advice
2010/3/11 Paul McGarry p...@paulmcgarry.com: I'm basically wondering how the postgresql cache (ie shared_buffers) and the OS page_cache interact. The general advice seems to be to assign 1/4 of RAM to shared buffers. I don't have a good knowledge of the internals but I'm wondering if this will effectively mean that roughly the same amount of RAM being used for the OS page cache will be used for redundantly caching something the Postgres is caching as well? I have a similar problem but I can't see an answer in this thread. Our dedicated server has 16 GB RAM. Among other settings shared_buffers is 2 GB, effective_cache_size is 12 GB. Do shared_buffers duplicate contents of OS page cache? If so, how do I know if 25% RAM is the right value for me? Actually it would not seem to be true - the less redundancy the better. Another question - is there a tool or built-in statistic that tells when/how often/how much a table is read from disk? I mean physical read, not poll from OS cache to shared_buffers. -- Konrad Garus -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[SPAM] Re: [PERFORM] shared_buffers advice
On May 24, 2010, at 4:25 AM, Konrad Garus wrote: Do shared_buffers duplicate contents of OS page cache? If so, how do I know if 25% RAM is the right value for me? Actually it would not seem to be true - the less redundancy the better. You can look into the pg_buffercache contrib module. Another question - is there a tool or built-in statistic that tells when/how often/how much a table is read from disk? I mean physical read, not poll from OS cache to shared_buffers. Well, the pg_stat_* tables tell you how much logical IO is going on, but postgres has no way of knowing how effective the OS or disk controller caches are. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] shared_buffers advice
On Wed, Mar 10, 2010 at 9:28 PM, Paul McGarry p...@paulmcgarry.com wrote: Hi there, I'm after a little bit of advice on the shared_buffers setting (I have read the various docs on/linked from the performance tuning wiki page, some very helpful stuff there so thanks to those people). I am setting up a 64bit Linux server running Postgresql 8.3, the server has 64gigs of memory and Postgres is the only major application running on it. (This server is to go alongside some existing 8.3 servers, we will look at 8.4/9 migration later) I'm basically wondering how the postgresql cache (ie shared_buffers) and the OS page_cache interact. The general advice seems to be to assign 1/4 of RAM to shared buffers. I don't have a good knowledge of the internals but I'm wondering if this will effectively mean that roughly the same amount of RAM being used for the OS page cache will be used for redundantly caching something the Postgres is caching as well? IE when Postgres reads something from disk it will go into both the OS page cache and the Postgresql shared_buffers and the OS page cache copy is unlikely to be useful for anything. If that is the case what are the downsides to having less overlap between the caches, IE heavily favouring one or the other, such as allocating shared_buffers to a much larger percentage (such as 90-95% of expected 'free' memory). I've personally heard tons of anecdotal evidence wrt shared buffers setting. There is a bit of benchmarking info suggesting you can eek marginal gains via shared buffers setting but you have to take (unfortunately) o/s, hardware, filesystem and other factors all into account. Here is what I'm pretty confident about saying: *) a page fault to disk is a much bigger deal than a fault to pg cache vs os/ cache. many people assume that raising shared buffers decreases the chance of a disk fault. it doesn't -- at least not in the simple way you would think -- all modern o/s aggressively cache filesystem data already so we are simply layering over the o/s cache. If your database is really big -- anything that reduces disk faults is a win and increases them is a loss. tps measurements according to pgbench are not as interesting to me as iops from the disk system. *) shared buffer affects are hard to detect in the single user case. The performance of a single 'non disk bound' large query will perform pretty much the same regardless of how you set shared buffers. In other words, you will not be able to easily measure the differences in the setting outside of a real or simulated production workload. *) shared_buffers is one of the _least_ important performance settings in postgresql.conf Many settings, like work_mem, planner tweaks, commit settings, autovacuum settings, can dramatically impact your workload performance in spectacular ways, but tend to be 'case by case' specific. shared buffers affects _everything_, albeit in very subtle ways, so you have to be careful. *) I sometimes wonder if the o/s should just manage everything. we just said goodbye to the fsm (thank goodness for that!) -- what about a fully o/s managed cache? goodbye svsv ipc? note my views here are very simplistic -- I don't have anything close to a full understanding of the cache machinery in the database. merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] which hardware setup
Hello, I work for a web app to send email newsletters, and I have one question about postgres' performance in two different setups. Actually we have one 4GB Ram VPS running our app server (it's a rails app under nginx and thin) and a 4GB Ram VPS running the database (18GB). We want to migrate to bare metal servers, but I have made two different setups and don't know what is the best option: Option 1: App Server: Dual Xeon 5130 dual core with 4GB ram and SATA disk Postgres: Xeon 3360 quad core with 4GB ram and 2 x 146GB 15k RPM SAS (RAID1) disks Option 2: App Server and Postgres: Dual Xeon 5520 quad core with 12GB ram and 2x 146GB 15k RPM SAS (RAID1) disks I know the first option would be better in terms of I/O for postgres, but our app server doesnt use much I/O and with the second option we would have much more ram. Thank you Pedro Axelrud http://mailee.me http://softa.com.br http://flavors.me/pedroaxl
[PERFORM] Random Page Cost and Planner
Hi, I wrote a query (see below) that extracts climate data from weather stations within a given radius of a city using the dates for which those weather stations actually have data. The query uses the measurement table's only index: CREATE UNIQUE INDEX measurement_001_stc_idx ON climate.measurement_001 USING btree (*station_id, taken, category_id*); The value for *random_page_cost* was at 2.0; reducing it to 1.1 had a massive performance improvement (nearly an order of magnitude). While the results now return in 5 seconds (down from ~85 seconds), problematic lines remain. Bumping the query's end date by a single year causes a full table scan: sc.taken_start = '1900-01-01'::date AND sc.taken_end = '1997-12-31'::date AND * * How do I persuade PostgreSQL to use the indexes, regardless of number of years between the two dates? (A full table scan against 43 million rows is probably not the best plan.) Find the EXPLAIN ANALYSE results below the query. Thanks again! Dave Query SELECT extract(YEAR FROM m.taken) AS year, avg(m.amount) as amount FROM climate.city c, climate.station s, climate.station_category sc, climate.measurement m WHERE c.id = 5182 AND earth_distance( ll_to_earth(c.latitude_decimal,c.longitude_decimal), ll_to_earth(s.latitude_decimal,s.longitude_decimal)) / 1000 = 30 AND s.elevation BETWEEN 0 AND 3000 AND s.applicable = TRUE AND sc.station_id = s.id AND sc.category_id = 1 AND *sc.taken_start = '1900-01-01'::date AND sc.taken_end = '1996-12-31'::date AND *m.station_id = s.id AND m.taken BETWEEN sc.taken_start AND sc.taken_end AND m.category_id = sc.category_id GROUP BY extract(YEAR FROM m.taken) ORDER BY extract(YEAR FROM m.taken) 1900 to 1996: Index* *Sort (cost=1348597.71..1348598.21 rows=200 width=12) (actual time=2268.929..2268.935 rows=92 loops=1) Sort Key: (date_part('year'::text, (m.taken)::timestamp without time zone)) Sort Method: quicksort Memory: 32kB - HashAggregate (cost=1348586.56..1348590.06 rows=200 width=12) (actual time=2268.829..2268.886 rows=92 loops=1) - Nested Loop (cost=0.00..1344864.01 rows=744510 width=12) (actual time=0.807..2084.206 rows=134893 loops=1) Join Filter: ((m.taken = sc.taken_start) AND (m.taken = sc.taken_end) AND (sc.station_id = m.station_id)) - Nested Loop (cost=0.00..12755.07 rows=1220 width=18) (actual time=0.502..521.937 rows=23 loops=1) Join Filter: ((sec_to_gc(cube_distance((ll_to_earth((c.latitude_decimal)::double precision, (c.longitude_decimal)::double precision))::cube, (ll_to_earth((s.latitude_decimal)::double precision, (s.longitude_decimal)::double precision))::cube)) / 1000::double precision) = 30::double precision) - Index Scan using city_pkey1 on city c (cost=0.00..2.47 rows=1 width=16) (actual time=0.014..0.015 rows=1 loops=1) Index Cond: (id = 5182) - Nested Loop (cost=0.00..9907.73 rows=3659 width=34) (actual time=0.014..28.937 rows=3458 loops=1) - Seq Scan on station_category sc (cost=0.00..970.20 rows=3659 width=14) (actual time=0.008..10.947 rows=3458 loops=1) Filter: ((taken_start = '1900-01-01'::date) AND (taken_end = '1996-12-31'::date) AND (category_id = 1)) - Index Scan using station_pkey1 on station s (cost=0.00..2.43 rows=1 width=20) (actual time=0.004..0.004 rows=1 loops=3458) Index Cond: (s.id = sc.station_id) Filter: (s.applicable AND (s.elevation = 0) AND (s.elevation = 3000)) - Append (cost=0.00..1072.27 rows=947 width=18) (actual time=6.996..63.199 rows=5865 loops=23) - Seq Scan on measurement m (cost=0.00..25.00 rows=6 width=22) (actual time=0.000..0.000 rows=0 loops=23) Filter: (m.category_id = 1) - Bitmap Heap Scan on measurement_001 m (cost=20.79..1047.27 rows=941 width=18) (actual time=6.995..62.390 rows=5865 loops=23) Recheck Cond: ((m.station_id = sc.station_id) AND (m.taken = sc.taken_start) AND (m.taken = sc.taken_end) AND (m.category_id = 1)) - Bitmap Index Scan on measurement_001_stc_idx (cost=0.00..20.55 rows=941 width=0) (actual time=5.775..5.775 rows=5865 loops=23) Index Cond: ((m.station_id = sc.station_id) AND (m.taken = sc.taken_start) AND (m.taken = sc.taken_end) AND (m.category_id = 1)) Total runtime: 2269.264 ms 1900 to 1997: Full Table Scan* *Sort (cost=1370192.26..1370192.76 rows=200 width=12) (actual time=86165.797..86165.809 rows=94 loops=1) Sort Key: (date_part('year'::text, (m.taken)::timestamp without time zone)) Sort Method: quicksort Memory: 32kB - HashAggregate (cost=1370181.12..1370184.62 rows=200 width=12)